3.3. Results

Every Job instance has an associated Results instance created automatically on job creation and stored in results attribute. The goal of Results is to take care of the job folder after the execution of the job is finished: gather information about produced files, help to manage them and extract data of interest from them. From the technical standpoint, Results is the part of the job running mechanism that is responsible for thread safety and proper synchronization in parallel job execution.

3.3.1. Files in the job folder

Directly after execution of a job is finished (see Running a job), the job folder gets scanned by collect() method. All files present there, including files in subfolders, are gathered in a list stored in files attribute of the Results instance. Entries in this list correspond to paths to files relative to the job folder, so files on the top level are stored by their names and files in subfolders by something like childjob/childjob.out.

Note

Files produced by Pickling are excluded from this mechanism. Every file with .dill extension is simply ignored by Results.

If you need an absolute path to some file, the bracket notation known from dictionaries is defined for Results objects. When supplied with an entry from files list, it returns the absolute path to that file. This mechanism is read-only:

>>> r = j.run()
>>> print(r.files)
['plamsjob.err', 'plamsjob.in', 'plamsjob.out', 'plamsjob.run']
>>> print(r['plamsjob.out'])
/home/user/plams.12345/plamsjob/plamsjob.in
>>> r['newfile.txt'] = '/home/user/abc.txt'
TypeError: 'Results' object does not support item assignment

In the bracket notation and in every other context regarding Results, whenever you need to pass a string with a filename, shortcut $JN can be used for the job name:

>>> r.rename('$JN.out', 'outputfile')
>>> r.grep_file('$JN.err', 'NORMAL TERMINATION')
>>> print(r['$JN.run'])
/home/user/plams.12345/plamsjob/plamsjob.run

Some external binaries produce fixed name files during execution (like for example ADF’s TAPE21). If one wants to automatically rename those files it can be done with _rename_map class attribute:

>>> print(ADFResults._rename_map)
{'TAPE13': '$JN.t13', 'TAPE21': '$JN.t21'}

As presented in the above example, _rename_map is a dictionary defining which files should be renamed and how. Renaming is done only once, on collect(). In generic Results class _rename_map is an empty dictionary.

3.3.2. Synchronization of parallel job executions

One of the main advantages of PLAMS is the ability to run jobs in parallel. The whole job execution mechanism is designed in such a way that there is no need to prepare a special parallel script, the same scripts can be used for both serial and parallel execution. However, it is important to have a basic understanding of how parallelism in PLAMS works to avoid deadlocks and maximize the performance of your scripts.

To run your job in parallel you need to use a parallel job runner:

>>> pjr = JobRunner(parallel=True)
>>> myresults = myjob.run(jobrunner=pjr)

Parallelism is not something that is “enabled” or “disabled” for the entire script: within one script you can use multiple job runners, some of them may be parallel and some may be serial. However, if you wish to always use the same JobRunner instance, it is convenient to set is as a default at the beginning of your script:

>>> config.default_jobrunner = JobRunner(parallel=True)

All run() calls without jobrunner argument supplied will now use this instance.

When you run a job using a serial job runner, all steps of run() (see Running a job) are done in the main thread and Results instance is returned at the end. On the other hand, when a parallel job runner is used, a new thread is spawned at the beginning of run() and all further work is done in this thread. Meanwhile the main thread proceeds with execution of the script. The important thing is that run() method called in the main thread returns Results instance and allows the whole script to proceed even though the job is still running in a separate thread. This Results instance acts as a “guardian” protecting the job from being accessed while it is still running. Every time you call a method of any Results instance, the guardian checks the status of associated job and if the job is not yet finished, it forces the thread from which the call was done to wait. Thanks to that there is no need to explicitly put synchronization points in the script – results requests serve for that purpose.

Warning

You should NEVER access results in any other way than by a method of Results instance.

The Results class is designed in such a way, that each of its methods automatically gets wrapped with the access guardian when Results instance is created. That behavior holds for any Results subclasses and new methods defined by user, so no need to worry about guardian when extending Results functionality. Also Binding decorators recognize when you try to use them with Results and act accordingly. Methods whose names end with two underscores, as well as refresh(), collect(), _clean() are not wrapped with the guardian. The guardian gives special privileges (earlier access) to postrun() and check() (see Prerun and postrun methods).

Technical

The behavior described above is implemented using Python mechanism called metaclasses. The guardian is simply a decorator wrapping instance methods.

If you never request any results of your job and just want to run it, finish() method works as a global synchronization point. It waits for all spawned threads to end before cleaning the environment and exiting your script.

3.3.2.1. Examples

This section provides a handful of examples together with an explanation of common pitfalls and good practices one should keep in mind when writing parallel PLAMS scripts.

Let us start with a simple parallel script that takes all .xyz files in a given folder and for each one calculates a dipole moment magnitude using a single point ADF calculation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import os
config.default_jobrunner = JobRunner(parallel=True)

folder = '/home/user/xyz'
filenames = sorted(filter(lambda x: x.endswith('.xyz'), os.listdir(folder)))

s = Settings()
s.input.basis.type = 'DZP'
s.input.geometry.SP = True
s.input.xc.gga = 'PBE'

jobs = [ADFJob(molecule=Molecule(os.path.join(folder,f)), name=f.rstrip('.xyz'), settings=s) for f in filenames]
results = [job.run() for job in jobs]

for r in results:
    dipole_vec = r.readkf('Properties', 'Dipole')
    dipole_magn = sum([a*a for a in dipole_vec])**0.5
    print(r.job.name + '\t\t' + str(dipole_magn))

For an explanation purpose let us assume that folder /home/user/xyz contains three files: Ammonia.xyz, Ethanol.xyz, Water.xyz. When you run this script the standard output will look something like:

[14:34:17] Job Ammonia started
[14:34:17] Job Ethanol started
[14:34:17] Job Water started
[14:34:17] Waiting for job Ammonia to finish
[14:34:20] Job Water finished with status 'successful'
[14:34:20] Job Ammonia finished with status 'successful'
Ammonia         0.594949300726
[14:34:21] Waiting for job Ethanol to finish
[14:34:25] Job Ethanol finished with status 'successful'
Ethanol         0.594626131104
Water       0.708226707277

As you can see, print statements from line 18 are mixed with automatic logging messages. Let us examine in more detail what causes such a behavior. To do so we will follow what happens in the main thread. In line 5 an alphabetically sorted list of .xyz files from the given directory is created. The list of jobs prepared in line 12 follows the same order so the job named “Ethanol” will come after “Ammonia” and before “Water”. Line 13 is in fact a for loop that goes along the list of jobs, runs each of them and collects returned Results instances in a list called results. If we were using a serial job runner all work would happen in this line: the “Ethanol” job would start only when “Ammonia” was finished, “Water” would wait for “Ethanol” and the main thread would proceed only when “Water” is done.

In our case we are using a parallel job runner so the first job is started and quickly moves to a separate thread allowing the main thread to proceed to another instruction, which in this case is run() of the “Ethanol” job (and so on). Thanks to that all jobs are started almost immediately one after another, corresponding Results are gathered and the main thread proceeds to line 15 while all three jobs are running “in the background”, handled by separate threads. Now the main thread goes along results list (which follows the same order as filenames and jobs) and tries to obtain a dipole vector for each job. It uses readkf method of Results instance associated with the “Ammonia” job and since this job is still running, the main thread hangs and waits for the job to finish. Meanwhile we can see that the “Water” job ends and this fact is logged. Quickly after that also the “Ammonia” job finishes and the main thread obtains dipole_vec, calculates dipole_magn and prints it. Now the for loop in line 15 continues, this time for the “Ethanol” job. This job seems to be a bit longer than “Ammonia”, so it is still running and the main thread again hangs on the readkf method. After finally obtaining the dipole vector, calculating the magnitude and printing it, the for loop goes on with its last iteration, the “Water” job. This time there is no need to wait since the job is already finished - the result is calculated and printed immediately.

Knowing that, let us wonder what would happen if the order of jobs was different. If “Ethanol” was the first job on the list, by the time its results would be obtained and printed, both other jobs would have finished, so no further waiting would be needed. On the other hand, if the order was “Water”–”Ammonia”–”Ethanol”, the main thread would have to wait every time when executing line 16.

The most important lesson from the above is: the order in which you start jobs does not matter (too much), it is the order of results requests that makes a difference. Of course in our very simple example it influences only the way in which results are mixed with log messages, but in more complicated setups it can directly affect the runtime of your script.

By the way, to solve the problem with mixed print statements and logging messages one could first store data and print it when all results are ready:

to_print = []
for r in results:
    dipole_vec = r.readkf('Properties', 'Dipole')
    dipole_magn = sum([a*a for a in dipole_vec])**0.5
    to_print += [(r.job.name, dipole_magn)]
for nam, dip in to_print:
    print(nam + '\t\t' + str(dip))

Another way could be disabling logging to standard output by putting config.log.stdout = 0 at the beginning of the script (see log()).

Coming back to the main topic of our considerations, as we have seen above, parallelism in PLAMS is driven by results request. Not only the order of requests is important, but also (probably even more important) the place from which they are made. To picture this matter we will use the following script that performs geometry optimization followed by frequencies calculation of the optimized geometry:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
config.default_jobrunner = JobRunner(parallel=True)

go = ADFJob(name='GeomOpt', molecule=Molecule('geom.xyz'))
go.settings.input.geometry.go = True
... #other settings adjustments for geometry optimisation
go_results = go.run()

opt_geo = go_results.get_molecule('Geometry', 'xyz')

freq = ADFJob(name='Freq', molecule=opt_geo)
freq.settings.input.geometry.frequencies = True
... #other settings adjustments for frequency run
freq_results = freq.run()

do_other_work() # further part of the script, independent of GeomOpt and Freq

Again let us follow the main thread. In line 8 we can see a results request for optimized geometry from “GeomOpt” job. The main thread will then wait for this job to finish before preparing “Freq” job and running it. That means do_other_work(), whatever it is, will not start before “GeomOpt” is done, even though it could, since it is independent of GeomOpt and Freq results. This is bad. The main thread wastes time that could be used for do_other_work() on idle waiting. We need to fix our script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
config.default_jobrunner = JobRunner(parallel=True)

go = ADFJob(name='GeomOpt', molecule=Molecule('geom.xyz'))
go.settings.input.geometry.go = True
... #other settings adjustments for geometry optimisation
go_results = go.run()

freq = ADFJob(name='Freq')
freq.settings.input.geometry.frequencies = True
... #other settings adjustments for frequency run

@add_to_instance(freq)
def prerun(self):
    self.molecule = go_results.get_molecule('Geometry', 'xyz')

freq_results = freq.run()

do_other_work() # further part of the script, independent of GeomOpt and Freq

Now the results request have been moved from main script to the prerun() method of “Freq” job. This simple tweak changes everything since job’s prerun() is executed in job’s thread rather than the main thread. That means the main thread starts the “Freq” job immediately after starting “GeomOpt” job and then directly proceeds to do_other_work(). Meanwhile in the thread spawned for “Freq” the results request for molecule is made and that thread waits for “GeomOpt” to finish.

As seen in the above example, it is extremely important to properly configure jobs that are dependent (setup of one depends on results of another). Resolving all such dependencies in job’s thread rather than the main thread guarantees that waiting for results is done only by the code that really needs them.

Note

In some cases dependencies between job are not easily expressed via methods of Results (for example, one job sets up some environment that is later used by another job). In such cases one can use job’s depend attribute to explicitly tell the job about other jobs which it has to wait for. Adding job2 to job1.depend is roughly equivalent to putting job2.results.wait() in job1 prerun().

To sum up all the above considerations, here is the rule of thumb how to write properly working parallel PLAMS scripts:

  1. Request results as late as possible, preferably just before using them.
  2. If possible, avoid requesting results in the main thread.
  3. Place the result request in the thread in which this data is later used.

3.3.3. Cleaning job folder

Results instance associated with a job is responsible for cleaning the job folder (removing files that are no longer needed). Cleaning is done automatically, twice for each job, so usually there is no need to manually invoke it.

First cleaning is done during job execution, just after check() and before postrun(). The value adjusting first cleaning is taken from myjob.settings.keep and should be either string or list (see below). This cleaning will usually be used rather rarely. It is intended for purposes when your jobs produce large files that you don’t need for further processing. Running many of such jobs could then deplete disk quota and cause the whole script to crash. If you wish to immediately get rid of some files produced by your jobs (without having a chance to do anything with them), use this cleaning.

In the majority of cases it is sufficient to use second cleaning, which is performed at the end of your script, when finish() method is called. It is adjusted by myjob.settings.save. You can use second cleaning to remove files that you no longer need after you extracted relevant data earlier in your script.

The argument passed to _clean() (in other words the value that is supposed to be kept in myjob.settings.keep and myjob.settings.save) can be one of the following:

  • 'all' – nothing is removed, cleaning is skipped.
  • 'none' or [] or None – everything is removed from the job folder.
  • list of strings – list of filenames to be kept. Shortcut $JN can be used here, as well as *-wildcards. For example ['geo.*', '$JN.out', 'logfile'] will keep [jobname].out, logfile and all files whose names start with geo. and remove everything else from the job folder.
  • list of strings with the first element '-' – reversed behavior to the above, listed files will be removed. For example ['-', 't21.*', '$JN.err'] will remove [jobname].err and all files whose names start with t21.

3.3.3.1. Cleaning for multijobs

Cleaning happens for every job run with PLAMS, either single or multi. That means that if you have, for example, a single job that is a child of some multijob, its job folder will be cleaned two times by two different Results instances that can interfere with each other. Hence it is a good practice to set cleaning only on one level (either parent job or children jobs) and disable cleaning on the other level by using 'all'.

Another shortcut can be used for cleaning in multijobs. $CH is expanded with every possible child name. So for example if you have a multijob mj with 5 single job children (child1, child2 and so on) and you wish to keep only input and output files of children jobs you can set:

>>> mj.settings.save = ['$CH/$CH.in', '$CH/$CH.out']

It is equivalent to using:

>>> mj.settings.save = ['child1/child1.in', 'child2/child2.in', ... , 'child1/child1.out', 'child2/child2.out', ...]

As you can see in the above example, when cleaning a multijob folder you have to keep in mind the fact that files in subfolders are kept as relative paths.

3.3.4. API

class Results(job)[source]

General concrete class for job results.

job attribute stores a reference to associated job. files attribute is a list with contents of the job folder. _rename_map is a class attribute with the dictionary storing the default renaming scheme.

Bracket notation (myresults[filename] can be used to obtain full absolute paths to files in the job folder.

Instance methods are automatically wrapped with access guardian which ensures thread safety (see Synchronization of parallel job executions).

refresh()[source]

Refresh the contents of files list. Traverse the job folder (and all its subfolders) and collect relative paths to all files found there, except files with .dill extension.

This is a cheap and fast method that should be used every time there is some risk that contents of the job folder changed and files list is no longer up-to-date. For proper working of various PLAMS elements it is crucial that files always contains up-to-date information about contents of job folder.

All functions and methods defined in PLAMS that could change the state of job folder take care about refreshing files, so there is no need to manually call refresh() after, for example, rename(). If you are implementing new method of that kind, don’t forget about refreshing.

collect()[source]

Collect the files present in the job folder after execution of the job is finished. This method is simply refresh() plus rename according to _rename_map.

If you wish to override this function, you have to call the parent version at the beginning.

wait()[source]

Wait for associated job to finish.

Technical

This is not an abstract method. It does exactly what it should: nothing. All the work is done by _restrict() decorator that is wrapped around it.

grep_file(filename, pattern='', options='')[source]

Execute grep on a file given by filename and search for pattern.

Additional grep flags can be passed with options, which should be a single string containing all flags, space separated.

Returned value is a list of lines (strings). See man grep for details.

awk_file(filename, script='', progfile=None, **kwargs)[source]

Execute an AWK script on a file given by filename.

The AWK script can be supplied in two ways: either by directly passing the contents of the script (should be a single string) as a script argument, or by providing the path (absolute or relative to the file pointed by filename) to some external file containing the actual AWK script using progfile argument. If progfile is not None, the script argument is ignored.

Other keyword arguments (**kwargs) can be used to pass additional variables to AWK (see -v flag in AWK manual)

Returned value is a list of lines (strings). See man awk for details.

grep_output(pattern='', options='')[source]

Shortcut for grep_file() on the output file.

awk_output(script='', progfile=None, **kwargs)[source]

Shortcut for awk_file() on the output file.

rename(old, new)[source]

Rename a file from files. In both old and new shortcut $JN can be used.

get_file_chunk(filename, begin=None, end=None, match=0, inc_begin=False, inc_end=False, process=None)[source]

Extract a chunk of a text file given by filename consisting of all the lines between a line containing begin and a line containing end.

begin and end should be simple strings (no regular expressions allowed) or None (in that case matching is done from the very beginning or until the very end of the file). If multiple blocks delimited by begin end end are present in the file, match can be used to indicate which one should be printed (match*=0 prints all of them). *inc_begin and inc_end can be used to include/exclude the delimiting lines in the final result (by default they are excluded).

Returned value is a list of strings. process can be used to provide a function executed on each element of this list before returning it.

get_output_chunk(begin=None, end=None, match=0, inc_begin=False, inc_end=False, process=None)[source]

Shortcut for get_file_chunk() on the output file.

_clean(arg)[source]

Clean the job folder. arg should be a string or a list of strings. See Cleaning job folder for details.

_copy_to(other)[source]

Copy these results to other.

This method is used when Rerun prevention discovers an attempt to run a job identical to the one previously run. Instead of execution, results of the previous job are copied/linked to the new one.

This method is called from results of old job and other should be results of new job. The goal is to faithfully recreate the state of self in other. To achieve that all contents of jobs folder are copied (or hardlinked, if your platform allows that and self.settings.link_files is True) to other’s job folder. Moreover, all attributes of self (other than job and files) are exported to other using _export_attribute() method.

_export_attribute(attr, other)[source]

Export this instance’s attribute to other. This method should be overridden in your Results subclass if it has some attribute that is not properly copyable by copy.deepcopy().

other is the Results instance, attr is the value of the attribute to be copied. See SCMJob._export_attribute for an example implementation.

static _replace_job_name(string, oldname, newname)[source]

If string starts with oldname, maybe followed by some extension, replace oldname with newname.

__getitem__(name)[source]

Magic method to enable bracket notation. Elements from files can be used to get absolute paths.

_process_file(filename, command)[source]

Skeleton for all file processing methods. Execute command (should be a list of strings) on filename and return output as a list of lines.

Technical

Other parts of results module described below are responsible for giving Results class its unique behavior described in Synchronization of parallel job executions. They are presented here for the sake of completeness, from user’s perspective this information is rather irrelevant.

class _MetaResults[source]

Metaclass for Results. During new Results instance creation it wraps all methods with _restrict() decorator ensuring proper synchronization and thread safety. Methods listed in _dont_restrict as well as “magic methods” are not wrapped.

_restrict(func)[source]

Decorator that wraps methods of Results instances.

Whenever decorated method is called, the status of associated job is checked. Depending of its value access to the method is granted, refused or the calling thread is forced to wait for the right event to be set.

_caller_name_and_arg(frame)[source]

Extract information about name and arguments of a function call from a frame object

_privileged_access()[source]

Analyze contents of the current stack to find out if privileged access to the Results methods should be granted.

Privileged access is granted to two Job methods: postrun() and check(), but only if they are called from _finalize() of the same Job instance.