4.5. Data Set Evaluator

4.5.1. DataSetEvaluator class

DataSetEvaluator is class with two main functions:

  • DataSetEvaluator.calculate_reference() will evaluate a data_set with any engine settings and set the reference values
  • DataSetEvaluator.run() will evaluate the data_set with any engine settings, and provide many function to compare the predicted values with the reference values. You can only use run() if all data_set entries already have reference values.

After calling run(), you will be able to get

  • summary statistics like mean absolute error (MAE) and root-mean-squared error (RMSE)
  • partial contributions to the loss function value
  • tables with reference and predicted values in columns next to each other, that can be plotted with params plot
  • grouped summary statistics, partial contributions, reference-vs-prediction based on the extractor, expression, or any metadata key-value pairs.

4.5.1.1. Example: DataSetEvaluator.calculate_reference()

Note

The below examples use the plams.Settings class to define a computational engine. See the PLAMS documentation for more information about it.

from scm.params import *
from scm.plams import Settings

dse = DataSetEvaluator()

# any engine settings are possible
engine_settings = Settings()
engine_settings.input.ForceField.Type = 'UFF'

# a job collection is needed, can for example be loaded from disk
job_collection = JobCollection('job_collection.yaml')

# the data_set to be evaluated, can for example be loaded from disk
data_set = DataSet('data_set.yaml')

# print the original expression : reference value
print("Original reference values:")
for ds_entry in data_set:
   print("{}: {}".format(ds_entry.expression, ds_entry.reference))

# calculate reference. Set folder=None to not store the finished jobs on disk (can be faster)
# set overwrite=True to overwrite existing reference values
dse.calculate_reference(job_collection, data_set, engine_settings, overwrite=False, folder='saved_results')

# print the new expression : reference value
print("New reference values:")
for ds_entry in data_set:
   print("{}: {}".format(ds_entry.expression, ds_entry.reference))

4.5.1.2. Example: DataSetEvaluator.run()

from scm.params import *
from scm.plams import Settings

dse = DataSetEvaluator()

# any engine settings are possible
engine_settings = Settings()
engine_settings.input.ForceField.Type = 'UFF'

# a job collection is needed, can for example be loaded from disk
job_collection = JobCollection('job_collection.yaml')

# the data_set to be evaluated, can for example be loaded from disk
data_set = DataSet('data_set.yaml')

# run. Set folder=None to not store the finished jobs on disk (can be faster)
dse.run(job_collection, data_set, engine_settings, folder='saved_results')

# group the results by Extractor and then by Expression
dse.group_by(('Extractor', 'Expression'))

print(dse.str(stats=True, details=True))

# store the calculated results in a format that can later be
# used to initialize another DataSetEvaluator
dse.store('data_set_predictions.yaml')
dse.pickle_dump('data_set_evaluator.pkl')

4.5.1.3. Example: Load a saved DataSetEvaluator

The previous example used the store() and pickle_dump() methods to store the calculated results in text (.yaml) and binary (.pkl) format. They can be loaded as follows:

from scm.params import *
from scm.plams import Settings

dse = DataSetEvaluator('data_set_predictions.yaml')
print(dse)

# to load from binary .pkl one needs to call the .pickle_load() method
# and provide a path to the original data_set
dse2 = DataSetEvaluator()
dse2.pickle_load('data_set_evaluator.pkl', data_set='data_set.yaml')
print(dse2)

4.5.2. DataSetEvaluator API

class DataSetEvaluator(data_set=None, total_loss=None, residuals=None, contributions=None, raw_predictions=None, predictions=None, modified_reference=None, loss=None)

Convenience class for evaluating a data_set with any engine.

Run the evaluation with the run() function.

Then group the results based on the Extractor, Expression, or metadata key-value pairs with the group_by() method.

Print the results with str(stats=True, details=True)

  • stats=True will give the mean absolute error, root mean squared error, and partial contributions to the loss function
  • details=True will give a table of prediction vs. reference

The results are stored in the results attribute. It is of type GroupedResults, and can be accessed as follows:

>>> dse = DataSetEvaluator()
>>> dse.run(job_collection, data_set, engine_settings)
>>> dse.group_by(('Group', 'SubGroup')) # for grouping by Group and SubGroup metadata keys
>>> dse.results.mae
>>> dse.results.rmse
>>> dse.results['Forces'].mae
>>> dse.results['Forces']['trajectory_1'].mae
>>> str(dse.results)
>>> dse.results.detailed_string()
>>> dse.results['Forces'].str()
>>> dse.results['Forces'].detailed_string()
>>> dse.results['Forces'].residuals
>>> dse.results['Forces'].predictions
>>> dse.results['Forces'].reference_values
etc.
__init__(data_set=None, total_loss=None, residuals=None, contributions=None, raw_predictions=None, predictions=None, modified_reference=None, loss=None)

Typically you should initialize this class without arguments, i.e., as

>>> dse = DataSetEvaluator()

data_set, predictions, residuals, contributions, total_loss can either be set in this constructor, or will internally be calculated with the run() method.

data_set : DataSet
Dataset that was evaluated
total_loss : float
Return value from data_set.evaluate(results, return_residuals=True)[0]
residuals: list
Return value from data_set.evaluate(results, return_residuals=True)[1]
contributions: list
Return value from data_set.evaluate(results, return_residuals=True)[2]
raw_predictions: list
Return value from data_set.evaluate(results, return_residuals=True)[3]
predictions: list
Return value from data_set.get_predictions(raw_predictions, return_reference=True)[0]
modified_reference : list
Return value from data_set.get_predictions(raw_predictions, return_reference=True)[1]
loss : a LossFunction or str
The type of loss function that was used to calculate total_loss
calculate_reference(job_collection: scm.params.core.jobcollection.JobCollection, data_set: scm.params.core.dataset.DataSet, engine_settings, overwrite=False, use_pipe=True, folder=None, parallel=None, use_origin=False)

Method to calculate and set the reference values for the entries in data_set. This method will change the data_set!

The method does not modify the DataSetEvaluator instance.

engine_settings : Settings or EngineCollection

If a Settings instance, that will define the reference engine used to calculate all the jobs.

If an EngineCollection, every job in the job_collection must have a ReferenceEngineID (reference_engine), that is present in the EngineCollection. The settings will then be taken from the engine collection. If more than one engine is needed to evaluate the jobs, then you must pass in an EngineCollection.

overwrite : bool
If False, only calculate reference values for data set entries that have no reference value. If True, calculate all reference values.
use_origin : bool

If a job in the job_collection has the “Origin” metadata pointing to an ams.rkf results file on disk, then load results from that file instead of rerunning the job.

If both the “Origin” and “Frame” metadata keys exist, data will be taken from the correct frame in the trajectory.

If the Origin, Frame, and OriginalEnergyHartree metadata keys exist, then the energy will be taken from the OriginalEnergyHartree metadata if the ams.rkf in Origin cannot be loaded (for example if it exists on a different machine).

If loading data from the “Origin” or “OriginalEnergyHartree” fails, the job will be run.

job_collection, data_set, use_pipe, folder, and parallel have the same meaning as in the run() method.

run(job_collection: scm.params.core.jobcollection.JobCollection, data_set: scm.params.core.dataset.DataSet, engine_settings: scm.plams.core.settings.Settings, loss='sse', use_pipe=True, folder=None, parallel=None, group_by=None)

Runs the jobs in the job collection using the engine defined by engine_settings, and evaluates the data_set expressions.

job_collection : JobCollection
The job collection containing the jobs
data_set : DataSet
The data_set containing the expressions to be evaluated
engine_settings : Settings

The engine settings to be used. Example:

>>> engine_settings = Settings()
>>> engine_settings.input.ForceField.Model = 'UFF'
loss : str or Loss
The type of loss function
use_pipe : bool
Whether to use the pipe interface if possible. This will speed up the calculation. Cannot be combined with folder.
folder : str
If folder is not None, the results will be stored on disk in that folder. If the folder already exists, a new one is created. If set, will automatically disable the pipe interface.
parallel : ParallelLevels
Defaults to ParallelLevels(parametervectors=1, processes=1, threads=1). This will run N jobs in parallel, where N is the number of cores on the machine.
group_by : tuple of str
Group results according to the tuple. The grouping can also be changed after the run with the group_by() method.
group_by(group_by)

Group the results according to group_by. The run() method needs to called before calling this method.

group_by : tuple of str
>>> group_by(('Extractor')) # group by extractor
>>> group_by(('Extractor', 'Expression')) # group by extractor, then expression. The expression will be filtered
>>> group_by(('Group', 'SubGroup')) # group by the metadata key Group, then by the metadata key SubGroup
__str__()

Return str(self).

pickle_load(fname, data_set=None, more_extractors=None)

Loads a DataSetEvaluator from a pickled file.

pickle_dump(fname, data_set_fname=None)

Stores the DataSetEvaluator to a (compressed) pickled file.
The file will be automatically compressed when the file ending is .gz or .gzip.

NOTE: the data_set is not stored in the same file as the DataSetEvaluator! The data_set is only stored if the data_set_fname argument is given.