3.5. Job manager¶
Job manager is the “commander” of PLAMS environment. It creates the structure of the working folder, manages its contents, and keeps track of all jobs you run.
Every instance of
JobManager is tied to a working folder.
This folder is created when
JobManager instance is initialized and all the jobs managed by that instance have their job folders inside the working folder.
You should not change job manager’s working folder after it has been created.
When you initialize PLAMS environment with the
init() function, an instance of
JobManager is created and stored in
This instance is tied to the main working folder (see The launch script for details) and used as a default every time some interaction with a job manager is required.
In a normal situation you would never explicitly interact with a
JobManager instance (create it manually, call any of its methods, explore its data etc.).
All interactions are handled automatically from
run() or other methods.
Usually there is no need to use any other job manager than the default one.
Splitting work between multiple instances of
JobManager may lead to some problems (different instances don’t communicate, so Rerun prevention does not work properly).
However, it is possible to manually create another instance of
JobManager (with a different working folder) and use it for part of your jobs (by passing it as
jobmanager keyword argument to
If you decide to do so, make sure to pass all instances of
JobManager you manually created to
finish() (as a list).
An example application for that could be running jobs within your script on many different machines (for example via SSH) and having a separate
JobManager on each of them.
3.5.1. Rerun prevention¶
In some situations, for example when running many automatically generated small jobs, it may happen that two or more jobs are identical – they have the same input files. PLAMS has a built-in mechanism to detect such situations and avoid unnecessary work.
run(), just before the actual job execution, a unique identifier (called hash) of a job is calculated.
Job manager stores all hashes of previously started jobs and checks if the hash of the job you are just running has already occurred.
If such a situation is detected, the execution of the current job is skipped and results of the previous job are used.
Results from previous job’s folder can be either copied or linked to the current job’s folder, based on
link_files key in previous job’s
Linking is done using hard links. Windows does not support hard links so if you are running PLAMS under Windows results are always copied.
The crucial part of the whole rerun prevention logic is a properly working
It is a function that takes the whole job instance and produces its hash.
The hashing function needs to produce different hashes for different jobs and exactly the same hashes for jobs that do exactly the same work.
It is far from trivial to come up with the scheme that works well for all kind of external binaries, since the technical details about job preparation can differ a lot.
Currently implemented method works based on calculating SHA256 hash of input and/or runscript contents.
The value of
hashing key in job manager’s
settings can be one of the following:
None to disable the rerun prevention).
If you decide to implement your own hashing method, it can be done by overriding
hash_input() and/or meth:~scm.plams.core.basejob.SingleJob.hash_runscript.
It may happen that two jobs with the same input and runscript files correspond to different jobs (for example, if they rely on some external file that is supplied using relative path).
Sometimes it’s even a desired behavior to run multiple different copies of the same job (for example, multiple MD trajectories with the same starting point and random initial velocities).
If you are experiencing problems (PLAMS refuses to run a job, becasue it was already run in the past), you can disable the rerun prevention with
config.default_jobmanager.settings.hashing = None.
Hashing is disabled for
MultiJob instances since they don’t have inputs and runscripts.
Of course single jobs that are children of multijobs are hashed in the normal way, so trying to run exactly the same multijob twice will not trigger rerun prevention on the multijob level, but rather for every children job separately, effectively preventing any doubled work.
The lifetime of the whole PLAMS environment is limited to a single script.
That means every PLAMS script you run uses its own independent job manager, working folder and
These objects are initialized at the beginning of the script with
init() command and they cease to exist when the script ends.
Also all the settings adjustments (apart from those done by editing Defaults file) are local and they affect only the current script.
As a consequence of that, the
JobManager of the current script is not aware of any jobs that had been run in past scripts.
But often it would be very useful to import a previously run job to the current script and use its results or build some new jobs based on it.
For that purpose PLAMS offers data preservation for job objects.
Every time an execution of a job successfully finishes (see Running a job), the whole job object is saved to a
.dill file using Python serialization called
.dill file can be loaded (“unpickled”) in future scripts using
oldjob = load('/home/user/science/plams_workdir/myjob/myjob.dill')
This operation brings back the old
Job instance in (almost) the same state it was just after its execution finished.
The default Python pickling package
pickle is not powerful enough to handle some of common PLAMS objects.
Fortunately, the dill package provides an excellent replacement for
pickle, following the same interface and being able to save and load almost everything.
It is strongly recommended to use
dill to ensure proper work of PLAMS data preservation logic.
dill is not installed for the Python interpreter you’re using to run PLAMS, the regular
pickle package will be used instead (which can work if your
Job objects are not too fancy, but in most cases it will probably fail).
dill, it’s free, easy to get and awesome.
The pickling mechanism follows references in pickled object.
That means if an object you are trying to pickle contains a reference to another object (just like a
Job instance has a reference to a
Results instance), that other object is saved too.
Thanks to that there are no “empty” references in your objects after unpickling.
Job instance in PLAMS has a reference to a job manager, which in turns has references to all other jobs, so pickling one job would effectively mean pickling the whole environment.
To avoid that, every
Job instance needs to be prepared for pickling by removing references to “global” objects, as well as some local attributes (path to the job folder for example).
During loading, all the removed data is replaced with “proper” values (current job manager, current path to the job folder etc.).
There is a way of expanding the mechanism explained above.
Job object has an attribute with reference to an object you don’t want to save together with the job, you may add this object’s name to job’s
myjob.something = some_big_and_clumsy_object_you_dont_want_to_pickle myjob._dont_pickle.append('something')
That way big clumsy object will not be stored in the
After loading such a
.dill file the value of
myjob.something will simply be
_dont_pickle list is an attribute of every
Job instance, initially an empty list.
It does not contain names of attributes that are always removed (like
jobmanager), it’s meant only for additional ones defined by the user (see
As mentioned above, pickling a job happens at the very end of
The decision if a job should be pickled is based on the
pickle key in job’s
settings, so it can be adjusted for each job separately.
If you wish not to pickle a particular job just set
myjob.settings.pickle = False.
Of course the global default
config.job.pickle can also be used.
If you modify a job or its corresponding
Results instance after it has been pickler, these changes are not going to be reflected in the
.dill file, since it was created before the changes happened.
To update the state of the
.dill file to include such changes you need to repickle the job manually by calling
myjob.pickle() after doing your changes.
Results instance associated with the job is saved together with it.
However, these results don’t contain all files produced by the job execution, but only relative paths to them.
For that reason the
.dill file is not enough to fully restore the job object if you want to extract or process the results.
All other files present in the job folder are needed so that
Results instance can see them.
So if you want to copy a previously run job to another location make sure to copy the whole job folder (including subdirectories).
A job loaded with
load() is not registered in the current job manager.
That means it does not get its own subfolder in the current working folder, it never gets renamed and no Cleaning job folder is done on
However, it is added to the hash registry, so it is visible to Rerun prevention.
In case of a
MultiJob all the information about children jobs is stored in parent’s
.dill file so loading a
MultiJob results in loading all its children.
Each child job can have its own
.dill file containing information about that particular job only.
When pickling, the
parent attribute of a job is erased, so loading a child job does not result in loading its parent (and all other children).
3.5.3. Restarting scripts¶
Pickling and rerun prevention combine together into a handy restart mechanism.
When your script tries to do something “illegal”, an exception is raised and the script gets terminated by the Python interpreter.
Usually it is caused by a mistake in the script (a typo, using wrong variable, accessing wrong element of a list etc.).
In such a case one would like to correct the script and run it again.
But some jobs in the terminated script may had already been run and successfully finished before the exception occurred.
It would be a waste of time to run those jobs again in the corrected script if they are meant to produce exactly the same results as previously.
The solution is to load all successful jobs from the old script at the beginning of the new one and let Rerun prevention do the rest.
But having to go to the old script’s working folder and manually get paths to all
.dill files present there would be cumbersome.
Fortunately, one can use
load_all() function which takes a path to the main working folder of some finished PLAMS run and loads all
.dill files present there.
So when you edit your crashed script to remove mistakes you can add just one
load_all() call at the beginning.
Then you run your corrected script and no unnecessary work is done: all the finished jobs are loaded from the previous run, the current run tries to run the same jobs again, but Rerun prevention detects that and copies/links old jobs’ folders into the current main working folder.
If you’re executing your PLAMS scripts using the The launch script restarting is even easier. It can be done in two ways:
If you wish to perform the restart run in a fresh, empty working folder, all you need to do is to import the contents of the previous working folder (from the crashed run) using
plams myscript.plms [17:28:40] PLAMS working folder: /home/user/plams_workdir #[crashed] #[correct myscript.plms] plams -l plams_workdir myscript.plms [17:35:44] PLAMS working folder: /home/user/plams_workdir.002
This is eqivalent to putting
load_all('plams_workdir')at the top of
myscript.plmsand running it with the usual
If you would prefer an in-place restart in the same working folder, you can use
plams myscript.plms [17:28:40] PLAMS working folder: /home/user/plams_workdir #[crashed] #[correct myscript.plms] plams -r myscript.plms [17:35:44] PLAMS working folder: /home/user/plams_workdir
In this case the launch script will temporarily move all the contents of
plams_workdir.res, import all the jobs from there and start a regular run in now empty
Please remember that rerun prevention checks the hash of the job after the
prerun() method is executed.
So when you attempt to run a job identical to the one previously run (in the same script, or imported from a previous run), its
prerun() method is executed anyway, even if the rest of Running a job is skipped.
JobManager(settings, path=None, folder=None)¶
Class responsible for jobs and files management.
Every instance has the following attributes:
foldername– the working folder name.
workdir– the absolute path to the working folder.
logfile– the absolute path to the logfile.
input– the absolute path to the copy of the input file in the working folder.
Settingsinstance for this job manager (see below).
jobs– a list of all jobs managed with this instance (in order of
names– a dictionary with names of jobs. For each name an integer value is stored indicating how many jobs with that basename have already been run.
hashes– a dictionary working as a hash-table for jobs.
The path argument should be be a path to a directory inside which the main working folder will be created. If
None, the directory from where the whole script was executed is used.
foldernameattribute is initially set to the folder argument. If such a folder already exists, the suffix
.002is appended to folder and the number is increased (
.004…) until a non-existsing name is found. If folder is
None, the name
plams_workdiris used, followed by the same procedure to find a unique
settingsattribute is directly set to the value of settings argument (unlike in other classes where they are copied) and it should be a
Settingsinstance with the following keys:
hashing– chosen hashing method (see Rerun prevention).
counter_len– length of number appended to the job name in case of a name conflict.
True, all empty subdirectories of the working folder are removed on
__init__(settings, path=None, folder=None)¶
Initialize self. See help(type(self)) for accurate signature.
Load previously saved job from filename.
Filename should be a path to a
.dillfile in some job folder. A
Jobinstance stored there is loaded and returned. All attributes of this instance removed before pickling are restored. That includes
path(the absolute path to the folder containing filename is used) and
default_settings(a list containing only
See Pickling for details.
Remove job from the job manager. Forget its hash.
Register the name of the job.
If a job with the same name was already registered, job is renamed by appending consecutive integers. The number of digits in the appended number is defined by the
Register the job. Register job’s name (rename if needed) and create the job folder.
Calculate the hash of job and, if it is not
None, search previously run jobs for the same hash. If such a job is found, return it. Otherwise, return
Clean all registered jobs according to the
saveparameter in their
True, traverse the working directory and delete all empty subdirectories.