## Load a data_set_predictions.yaml file

The most common way to use a DataSetEvaluator is to load the ``data_set_predictions.yaml`` file produced during an optimization.

In [3]:
from scm.params import *
import os

# if you go via ParAMSJob:
# job = ParAMSJob.load_external('/path/results/')
# dse = job.results.get_data_set_evaluator()

# to just load the .yaml file:
yaml_file = os.path.expandvars(
 "$AMSHOME/scripting/scm/params/examples/ZnS_ReaxFF/example_output/best/data_set_predictions.yaml"
)
dse = DataSetEvaluator(yaml_file)

### Summary statistics (stats.txt)

The results can be grouped in different ways. By default, the data is grouped first by ``'Extractor'`` and then by ``'Expression'``. To get a file like ``stats.txt``, simply call the ``.str()`` method:

In [25]:
print(dse.str())

Group/Expression N MAE RMSE* Unit* Weight Loss* Contribution[%]
-------------------------------------------------------------------------------------------------------------------------------------------------
Total 466 0.59112 +0.97646 Mixed! 50.000 236.128 100.00 Total

 forces 297 0.76691 +1.02728 eV/angstrom 2.000 89.301 37.82 Extractor
 band_distorted_clean_110 144 0.89717 +1.14169 eV/angstrom 1.000 54.772 23.20 Expression
 band_distorted_ads_110 153 0.64432 +0.90649 eV/angstrom 1.000 34.529 14.62 Expression

 pes 30 0.09157 +0.14448 eV 13.000 65.858 27.89 Extractor
 pes('bondscan_h2s_pbesol', relative_to=5) 10 0.16696 +0.22823 eV 3.000 52.759 22.34 Expression
 pes('rocksalt', relative_to=2) 5 0.06917 +0.08621 eV 4.000 10.036 4.25 Expression
 pes('anglescan_h2s_pbesol', relative_to=2) 10 0.06506 +0.08106 eV 1.000 2.218 0.94 Expression
 pes('zincblende', relative_to=2) 5 0.01622 +0.02237 eV 5.000 0.844 0.36 Expression

 energy 9 0.63328 +1.63647 eV 9.000 26.796 11.35 Extractor
 1.0

Note that the extractor names for the various expressions are not shown if there are no arguments to the extractor. This makes the output more readable.

You can access individual entries from the above table as follows:

In [26]:
print(len(dse.results["charges"].residuals)) # the N for the charges

110


In [27]:
print(dse.results["charges"]["zincblende_sp"].mae) # MAE for an expression

0.151381045


In [28]:
print(dse.results["forces"].rmse) # RMSE for an extractor

1.0272752938737622


In [29]:
print(dse.results["forces"].unit) # unit for an extractor

eV/angstrom


In [30]:
print(
 dse.results["charges"]["wurtzite_sp"].weight
) # the weight is returned as a scalar, even for array reference values

1.0


In [35]:
print(
 dse.results["energy"]["1.0*zincblende_sp-0.5*wurtzite_sp"].my_loss
) # "my_loss" refers to the loss of the individual entry

1.4626476245701765


In [37]:
print(dse.results["forces"].contribution) # fractional contribution to the weighted loss function

0.3781877244315181


In [38]:
print(dse.results.total_loss) # total loss function value

236.1283753165323


In [39]:
print(dse.results.loss_type) # type of loss function

SSE()


You can also just print a summary of a part of the table:

In [41]:
print(dse.results["forces"].str())

Group/Expression N MAE RMSE* Unit* Weight Loss* Contribution[%]
-----------------------------------------------------------------------------------------------------------------------
forces 297 0.76691 +1.02728 eV/angstrom 2.000 89.301 37.82 Extractor
 band_distorted_clean_110 144 0.89717 +1.14169 eV/angstrom 1.000 54.772 23.20 Expression
 band_distorted_ads_110 153 0.64432 +0.90649 eV/angstrom 1.000 34.529 14.62 Expression
-----------------------------------------------------------------------------------------------------------------------
The weighted total loss function is 236.128.
N: number of numbers averaged for the MAE/RMSE
MAE and RMSE: These are not weighted!
RMSE*: if N == 1 the signed residual (reference-prediction) is given instead of the RMSE.
Unit*: if the unit is "Mixed!" it means that the MAE and RMSE are meaningless.
Loss function type: None. The loss function value is affected by the Weight and Sigma of data_set entries.
Contribution[%]: The contribution to the weig

You can also modify the grouping to only go one level deep:

In [43]:
dse.group_by(("Extractor",)) # the default is group_by(('Extractor', 'Expression'))
print(dse.str())

Group/Expression N MAE RMSE* Unit* Weight Loss* Contribution[%]
-------------------------------------------------------------------------------------------------------
Total 466 0.59112 +0.97646 Mixed! 50.000 236.128 100.00 Total
 forces 297 0.76691 +1.02728 eV/angstrom 2.000 89.301 37.82 Extractor
 pes 30 0.09157 +0.14448 eV 13.000 65.858 27.89 Extractor
 energy 9 0.63328 +1.63647 eV 9.000 26.796 11.35 Extractor
 angle 7 3.59374 +3.79878 degree 7.000 25.254 10.69 Extractor
 distance 12 0.04638 +0.06270 angstrom 12.000 18.869 7.99 Extractor
 charges 110 0.10562 +0.11519 au 6.000 9.138 3.87 Extractor
 dihedral 1 1.91010 +1.91010 degree 1.000 0.912 0.39 Extractor
-------------------------------------------------------------------------------------------------------
The weighted total loss function is 236.128.
N: number of numbers averaged for the MAE/RMSE
MAE and RMSE: These are not weighted!
RMSE*: if N == 1 the signed residual (reference-prediction) is given instead of the RMSE.
Unit*:

If there is metadata attached to the training set entries, you can also group by those. For example, when creating a training set with a ``ResultsImporter``, the ``Group`` and ``SubGroup`` metadata are automatically set:

In [44]:
dse.group_by(("Group", "SubGroup"))
print(dse.str())

Group/Expression N MAE RMSE* Unit* Weight Loss* Contribution[%]
--------------------------------------------------------------------------------------------------------------------------
Total 466 0.59112 +0.97646 Mixed! 50.000 236.128 100.00 Total

 Forces 297 0.76691 +1.02728 eV/angstrom 2.000 89.301 37.82 Group
 band_distorted_clean_110 144 0.89717 +1.14169 eV/angstrom 1.000 54.772 23.20 SubGroup
 band_distorted_ads_110 153 0.64432 +0.90649 eV/angstrom 1.000 34.529 14.62 SubGroup

 None 31 0.15023 +0.37134 Mixed! 14.000 66.770 28.28 Group
 bondscan_h2s_pbesol 10 0.16696 +0.22823 eV 3.000 52.759 22.34 SubGroup
 rocksalt 5 0.06917 +0.08621 eV 4.000 10.036 4.25 SubGroup
 anglescan_h2s_pbesol 10 0.06506 +0.08106 eV 1.000 2.218 0.94 SubGroup
 band_110_noconstraints 1 1.91010 +1.91010 degree 1.000 0.912 0.39 SubGroup
 zincblende 5 0.01622 +0.02237 eV 5.000 0.844 0.36 SubGroup

 ReactionEnergy 9 0.63328 +1.63647 eV 9.000 26.796 11.35 Group
 None 9 0.63328 +1.63647 eV 9.000 26.796 11.35 Sub

In [46]:
print(dse.results["Forces"].mae) # capital F in the Group metadata

0.7669139308273065


### Access individual predictions and reference values (scatter_plots/)

Call the ``.detailed_string()`` method to get files similar to ``scatter_plots/forces.txt`` etc.

In [77]:
dse.group_by(("Extractor", "Expression")) # reset to the original grouping
results = dse.results["pes"] # look at the results for the pes extractor
print(results.detailed_string())

#Reference Prediction Unit Sigma Weight WSE* Row* Col* Expression
#------------------------------------------------------------------------------------------------------------------------
+0.419 +0.392 eV 0.054 1.0000 0.255 0 0 pes('zincblende', relative_to=2)
+0.092 +0.092 eV 0.054 1.0000 0.000 1 0 pes('zincblende', relative_to=2)
+0.000 +0.000 eV 0.054 1.0000 0.000 2 0 pes('zincblende', relative_to=2)
+0.078 +0.092 eV 0.054 1.0000 0.065 3 0 pes('zincblende', relative_to=2)
+0.278 +0.317 eV 0.054 1.0000 0.525 4 0 pes('zincblende', relative_to=2)
+0.474 +0.336 eV 0.054 0.8000 5.171 0 0 pes('rocksalt', relative_to=2)
+0.111 +0.083 eV 0.054 0.8000 0.202 1 0 pes('rocksalt', relative_to=2)
+0.000 +0.000 eV 0.054 0.8000 0.000 2 0 pes('rocksalt', relative_to=2)
+0.073 +0.006 eV 0.054 0.8000 1.220 3 0 pes('rocksalt', relative_to=2)
+0.274 +0.161 eV 0.054 0.8000 3.442 4 0 pes('rocksalt', relative_to=2)
+1.335 +0.857 eV 0.054 0.3000 23.227 0 0 pes('bondscan_h2s_pbesol', relative_to=5)
+0.732 +0

In [67]:
print(results.reference_values) # list of reference values

[0.419314, 0.09164115, 0.0, 0.07841534, 0.27771948, 0.47439122, 0.11065178, 0.0, 0.07282489, 0.27377707, 1.33538276, 0.73173446, 0.34163308, 0.11433319, 0.01054105, 0.0, 0.05929865, 0.17040413, 0.3193521, 0.49523743, 0.64987859, 0.20385679, 0.0, 0.04011389, 0.30652919, 0.76818668, 1.37634888, 2.04264645, 2.59729761, 2.8175004]


In [68]:
print(results.predictions) # list of predicted values

[0.39185008, 0.09196291, 0.0, 0.09227838, 0.31714676, 0.33602331, 0.08328586, 0.0, 0.00560645, 0.16088318, 0.85651058, 0.55991041, 0.33346583, 0.16932453, 0.06042187, 0.0, -0.0185412, -0.00165861, 0.04446513, 0.11414837, 0.70031273, 0.18640377, 0.0, 0.09607728, 0.42212549, 0.90330381, 1.45275737, 1.99029061, 2.45778418, 2.825236]


In [69]:
print(results.unit) # the unit

eV


In [70]:
print(results.accuracies) # the Sigma values (per expression)

[0.054422772491975996, 0.054422772491975996, 0.054422772491975996, 0.054422772491975996]


In [71]:
print(results.weights) # the Weights (per reference/prediction)

[1.0, 1.0, 1.0, 1.0, 1.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]


In [72]:
print(results.contributions) # list of individual contributions (per expression)

[0.0035761475951556648, 0.04250420478293943, 0.2234329155570571, 0.009394469642807931]


In [78]:
print(results.expressions) # list of expressions

["pes('zincblende', relative_to=2)", "pes('rocksalt', relative_to=2)", "pes('bondscan_h2s_pbesol', relative_to=5)", "pes('anglescan_h2s_pbesol', relative_to=2)"]


Note that the number of reference values is different from the number of expressions when the reference values are arrays. To get the reference values per expression:

In [76]:
for e in results.expressions:
 print(f"Expression: {e}, Ref. values: {results[e].reference_values}")

Expression: pes('zincblende', relative_to=2), Ref. values: [0.419314, 0.09164115, 0.0, 0.07841534, 0.27771948]
Expression: pes('rocksalt', relative_to=2), Ref. values: [0.47439122, 0.11065178, 0.0, 0.07282489, 0.27377707]
Expression: pes('bondscan_h2s_pbesol', relative_to=5), Ref. values: [1.33538276, 0.73173446, 0.34163308, 0.11433319, 0.01054105, 0.0, 0.05929865, 0.17040413, 0.3193521, 0.49523743]
Expression: pes('anglescan_h2s_pbesol', relative_to=2), Ref. values: [0.64987859, 0.20385679, 0.0, 0.04011389, 0.30652919, 0.76818668, 1.37634888, 2.04264645, 2.59729761, 2.8175004]
