Benchmarking ML Potentials for Hydrocarbon Isomerization Energies¶

Compare several ML potentials on hydrocarbon isomerization reaction energies and inspect how closely they reproduce the reference ordering and energy differences.

Downloads: Notebook | Script ?

Requires: AMS2026 or later

../_images/benchmark_mlpot_isoc7_24_0_e286b7fd.png

Benchmarking Hydrocarbon Isomerization Energies¶

This benchmark aims to evaluate the accuracy of a range of machine learning potentials included in AMS, using the ISOC7 dataset from the paper: A.Karton and E.Semidalas, “Benchmarking Isomerization Energies for C5–C7 Hydrocarbons: The ISOC7 Database,” J. of Comp. Chem. 47(2) (2026): e70296, https://doi.org/10.1002/jcc.70296.

The ISOC7 dataset contains a diverse range of hydrocarbons from simple alkanes to highly strained systems featuring tetrahedrane units, as well as alkanes, alkenes, alkynes, and cyclic compounds. It has reference data to CCSD(T)/CBS level of accuracy and therefore provides a good benchmark for evaluating the accuracy of ML potentials for small organic systems.

Extract Reference ISOC7 Database¶

We first extract the ISOC7 database:

import os
import tarfile
from pathlib import Path

isoc7_tar = Path(os.environ["AMSHOME"]) / "atomicdata" / "Benchmarks" / "ISOC7.tar.gz"
isoc7 = Path("./ISOC7")

if not isoc7.exists():
    with tarfile.open(isoc7_tar, "r:gz") as tar:
        tar.extractall(path=Path.cwd())

Load XYZ Data¶

Then we load all the xyz files as PLAMS molecules:

import scm.plams as plams

molecules = plams.read_molecules(isoc7, formats=["xyz"])

We can see the number of loaded structures:

print(f"Number of loaded structures: {len(molecules)}")

Number of loaded structures: 1308

And view a selection of them, with their corresponding unique id:

imgs = {
    k: plams.view(v, width=300, height=300, guess_bonds=True)
    for k, v in list(molecules.items())[:4]
}

plams.plot_image_grid(imgs, rows=1);

../_images/benchmark_mlpot_isoc7_11_0_a702026a.png

Load Reference Data¶

Next we load the reference CCSD(T)/CBS energies from the benchmark:

import pandas as pd

reference_data = pd.read_csv(
    isoc7 / "ISOC7_W1-F12_Ref_Values.csv",
    usecols=[
        "reaction_name",
        "species1",
        "species2",
        "W1-F12_isomerization_energy (CCSD(T)/CBS, kcal/mol)",
    ],
)

reference_data

	reaction_name	species1	species2	W1-F12_isomerization_energy (CCSD(T)/CBS, kcal/mol)
0	ISO_1	C5H6_20	C5H6_1	91.439
1	ISO_2	C5H6_20	C5H6_2	90.146
2	ISO_3	C5H6_20	C5H6_3	69.361
3	ISO_4	C5H6_20	C5H6_4	61.970
4	ISO_5	C5H6_20	C5H6_5	56.739
...	...	...	...	...
1292	ISO_1293	C7H14_60	C7H14_55	8.228
1293	ISO_1294	C7H14_60	C7H14_56	6.718
1294	ISO_1295	C7H14_60	C7H14_57	5.015
1295	ISO_1296	C7H14_60	C7H14_58	4.639
1296	ISO_1297	C7H14_60	C7H14_59	4.495

1297 rows × 4 columns

Set up Data Sets with ParAMS¶

To run our benchmark, we are going to user ParAMS. First we set up the JobCollection. This contains the information on the input structures.

import scm.params as params

results_importer = params.ResultsImporter(settings={"units": {"energy": "kcal/mol"}})

for k, v in molecules.items():
    jce = params.JCEntry()
    jce.settings.input.AMS.Task = "SinglePoint"
    jce.molecule = v
    jce.metadata["Source"] = "https://doi.org/10.1002/jcc.70296"
    results_importer.job_collection.add_entry(k, jce)

print(results_importer.job_collection.from_jobids(["C5H10_1"]))

---
dtype: JobCollection
version: '2026.102'
---
ID: 'C5H10_1'
ReferenceEngineID: None
AMSInput: |
   Task SinglePoint
   System
     Atoms
                 C      -2.0912640000       0.3020780000       0.0337330000
                 H      -1.8884470000       1.3092600000      -0.3363480000
                 H      -2.9862710000      -0.0691500000      -0.4691390000
                 H      -2.3168350000       0.3813300000       1.0998540000
                 C      -0.8950030000      -0.6171310000      -0.2050890000
                 H      -1.1307500000      -1.6274460000       0.1449640000
                 H      -0.7058150000      -0.7029420000      -1.2800580000
                 C       0.3616680000      -0.1305570000       0.4747630000
                 H       0.2826960000      -0.0502440000       1.5542310000
                 C       1.7140390000      -0.5126740000      -0.0538700000
                 H       1.7487570000      -1.1235900000      -0.9469660000
                 H       2.5092940000      -0.7118750000       0.6517470000
                 C       1.2230000000       0.9054910000      -0.1923400000
                 H       1.6860140000       1.6700050000       0.4167400000
                 H       0.9267080000       1.2414110000      -1.1782100000
     End
   End
Source: https://doi.org/10.1002/jcc.70296
...

Next we set up the dataset containing the information on each isomerization reaction and the reference energy:

ha_to_kcal_mol = plams.Units.conversion_ratio("Ha", "kcal/mol")

for _, (s1, s2, ref_e) in reference_data[
    ["species1", "species2", "W1-F12_isomerization_energy (CCSD(T)/CBS, kcal/mol)"]
].iterrows():
    results_importer.data_set.add_entry(
        f"energy('{s2}')-energy('{s1}')",
        reference=ref_e,
        unit=("kcal/mol", ha_to_kcal_mol),
        metadata={
            "Source": "https://doi.org/10.1002/jcc.70296",
            "reference_method": "CCSD(T)/CBS",
        },
    )

print(results_importer.data_set.from_jobids(["C5H6_1", "C5H6_20"]))

---
dtype: DataSet
version: '2026.102'
---
Expression: energy('C5H6_1')-energy('C5H6_20')
Weight: 1.0
Sigma: 0.002
ReferenceValue: 91.439
Unit: kcal/mol, 627.5094737775374
Source: https://doi.org/10.1002/jcc.70296
reference_method: CCSD(T)/CBS
...

We can also visualize one of these isomerization reactions, for example the first one with a high isomerization energy:

imgs = {
    k: plams.view(molecules[k], width=300, height=300, guess_bonds=True)
    for k in ["C5H6_20", "C5H6_1"]
}

plams.plot_image_grid(imgs, rows=1);

Now we store the yaml files needed to run the ParAMS calculations:

results_importer.store("./isoc7_benchmark")

['./isoc7_benchmark/job_collection.yaml',
 './isoc7_benchmark/results_importer_settings.yaml',
 './isoc7_benchmark/training_set.yaml']

Note that the dataset can also be viewed using the ParAMS GUI:

# !params -gui ./isoc7_benchmark/job_collection.yaml

Run Benchmarks with ParAMS¶

Now we set up a benchmark job for each ML model of interest, and run them. We include all models suitable for small organic molecules, as well as some trained on materials/other domains for comparison.

models = [
    "AIMNet2-B973c",
    "AIMNet2-wB97MD3",
    "ANI-1ccx",
    "ANI-1x",
    "ANI-2x",
    "eSEN-S-Con-OMol",
    "M3GNet-UP-2022",
    "MACE-MPA-0",
    "UMA-S-1.2-ODAC",
    "UMA-S-1.2-OMat",
    "UMA-S-1.2-OMol",
]

jobs = {}

for model in models:
    job = params.ParAMSJob.from_yaml("./isoc7_benchmark", name=f"benchmark_{model}")
    job.settings.runscript.nproc = 1
    job.settings.runscript.preamble_lines = ["export OMP_NUM_THREADS=1"]
    job.settings.input.parallellevels.jobs = 1
    job.settings.input.mlpotential.model = model
    job.settings.input.task = "SinglePoint"
    jobs[model] = job

for job in jobs.values():
    job.run()
    job.ok()

[17.04|11:24:25] JOB benchmark_AIMNet2-B973c STARTED
[17.04|11:24:25] JOB benchmark_AIMNet2-B973c RUNNING
[17.04|11:24:54] JOB benchmark_AIMNet2-B973c FINISHED
[17.04|11:24:54] JOB benchmark_AIMNet2-B973c SUCCESSFUL
[17.04|11:24:54] JOB benchmark_AIMNet2-wB97MD3 STARTED
[17.04|11:24:54] JOB benchmark_AIMNet2-wB97MD3 RUNNING
[17.04|11:25:22] JOB benchmark_AIMNet2-wB97MD3 FINISHED
[17.04|11:25:22] JOB benchmark_AIMNet2-wB97MD3 SUCCESSFUL
[17.04|11:25:22] JOB benchmark_ANI-1ccx STARTED
[17.04|11:25:22] JOB benchmark_ANI-1ccx RUNNING
[17.04|11:25:35] JOB benchmark_ANI-1ccx FINISHED
[17.04|11:25:35] JOB benchmark_ANI-1ccx SUCCESSFUL
[17.04|11:25:35] JOB benchmark_ANI-1x STARTED
[17.04|11:25:35] JOB benchmark_ANI-1x RUNNING
[17.04|11:25:48] JOB benchmark_ANI-1x FINISHED
[17.04|11:25:48] JOB benchmark_ANI-1x SUCCESSFUL
[17.04|11:25:48] JOB benchmark_ANI-2x STARTED
... output trimmed ....
[17.04|11:27:55] JOB benchmark_M3GNet-UP-2022 SUCCESSFUL
[17.04|11:27:55] JOB benchmark_MACE-MPA-0 STARTED
[17.04|11:27:55] JOB benchmark_MACE-MPA-0 RUNNING
[17.04|11:28:59] JOB benchmark_MACE-MPA-0 FINISHED
[17.04|11:28:59] JOB benchmark_MACE-MPA-0 SUCCESSFUL
[17.04|11:28:59] JOB benchmark_UMA-S-1.2-ODAC STARTED
[17.04|11:28:59] JOB benchmark_UMA-S-1.2-ODAC RUNNING
[17.04|11:31:28] JOB benchmark_UMA-S-1.2-ODAC FINISHED
[17.04|11:31:29] JOB benchmark_UMA-S-1.2-ODAC SUCCESSFUL
[17.04|11:31:29] JOB benchmark_UMA-S-1.2-OMat STARTED
[17.04|11:31:29] JOB benchmark_UMA-S-1.2-OMat RUNNING
[17.04|11:33:54] JOB benchmark_UMA-S-1.2-OMat FINISHED
[17.04|11:33:54] JOB benchmark_UMA-S-1.2-OMat SUCCESSFUL
[17.04|11:33:54] JOB benchmark_UMA-S-1.2-OMol STARTED
[17.04|11:33:54] JOB benchmark_UMA-S-1.2-OMol RUNNING
[17.04|11:36:16] JOB benchmark_UMA-S-1.2-OMol FINISHED
[17.04|11:36:16] JOB benchmark_UMA-S-1.2-OMol SUCCESSFUL

Each completed job now has results for the errors between the model calculations and the reference. For example, with AIMNet2-wB97MD3:

print("\n".join(str(jobs["AIMNet2-wB97MD3"].results.get_data_set_evaluator()).split("\n")[:20]))

Group/Expression            N        MAE          RMSE*        Unit*          Weight     Loss*        Contribution[%]
---------------------------------------------------------------------------------------------------------------------
Total                       1297     1.28848      +1.60312     kcal/mol       1297.000   833315816.952 100.00     Total

   energy                   1297     1.28848      +1.60312     kcal/mol       1297.000   833315816.952 100.00     Extractor
      C7H8_51-C7H8_320      1        7.03500      -7.03500     kcal/mol       1.000      12372796.117 1.48       Expression
      C7H8_3-C7H8_320       1        6.84835      -6.84835     kcal/mol       1.000      11724976.294 1.41       Expression
      C6H6_2-C6H6_56        1        6.80974      -6.80974     kcal/mol       1.000      11593141.526 1.39       Expression
      C7H8_117-C7H8_320     1        6.56349      -6.56349     kcal/mol       1.000      10769839.645 1.29       Expression
      C7H8_79-C7H8_320      1        6.54360      -6.54360     kcal/mol       1.000      10704671.280 1.28       Expression
      C6H6_21-C6H6_56       1        6.35571      -6.35571     kcal/mol       1.000      10098752.565 1.21       Expression
      C7H8_10-C7H8_320      1        6.00152      -6.00152     kcal/mol       1.000      9004556.137  1.08       Expression
      C7H8_256-C7H8_320     1        5.79461      -5.79461     kcal/mol       1.000      8394388.810  1.01       Expression
      C7H8_9-C7H8_320       1        5.52882      -5.52882     kcal/mol       1.000      7641966.144  0.92       Expression
      C7H8_66-C7H8_320      1        5.14151      -5.14151     kcal/mol       1.000      6608776.360  0.79       Expression
      C7H8_4-C7H8_320       1        5.13745      -5.13745     kcal/mol       1.000      6598337.286  0.79       Expression
      C7H8_5-C7H8_320       1        4.50864      -4.50864     kcal/mol       1.000      5081948.943  0.61       Expression
      C7H8_44-C7H8_320      1        4.41894      -4.41894     kcal/mol       1.000      4881765.891  0.59       Expression
      C7H8_266-C7H8_320     1        4.04698      -4.04698     kcal/mol       1.000      4094507.700  0.49       Expression
      C7H10_35-C7H10_396    1        3.99482      +3.99482     kcal/mol       1.000      3989642.301  0.48       Expression

Results can alternatively be displayed in a correlation plot of the reference vs. predicted energies:

ax = jobs["AIMNet2-wB97MD3"].results.plot_simple_correlation("energy", validation_set=False)
ax;

../_images/benchmark_mlpot_isoc7_37_0_3914d105.png

Note that all these can also again be viewed in the ParAMS GUI:

# !params -gui ./plams_workdir/benchmark_AIMNet2-wB97MD3/results

Evaluating Benchmark Results¶

To evaluate the results, we extract the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) from the job results:

results = pd.DataFrame({"Model": models})

results["MAE"] = [jobs[m].results.get_data_set_evaluator().results.mae for m in models]
results["RMSE"] = [jobs[m].results.get_data_set_evaluator().results.rmse for m in models]

results.sort_values(by="MAE", inplace=True)

results.rename(
    columns={
        "MAE": "MAE<sup>*</sup>",
        "RMSE": "RMSE<sup>*</sup>",
    },
    inplace=True,
)

results.style.hide(axis="index").background_gradient(
    subset=["MAE<sup>*</sup>", "RMSE<sup>*</sup>"], cmap="Blues", vmin=0, vmax=10
).format("{:.2f}", subset=["MAE<sup>*</sup>", "RMSE<sup>*</sup>"]).set_caption(
    "<br><span style='text-align:left;font-size:18px'>Benchmark Results for <a href='https://doi.org/10.1002/jcc.70296' target='_blank'>ISOC7 Dataset</a> Isomerization Energies</span>"
    "<br><span style='font-size:14px'><sup>*</sup> Reference method CCSD(T)/CBS, in kcal/mol</span>"
).set_table_styles(
    [
        {"selector": "th", "props": [("border-bottom", "2px solid black")]},
        {"selector": "th, td", "props": [("min-width", "100px")]},
        {"selector": "td", "props": [("border-bottom", "1px solid lightgray")]},
        {"selector": "table", "props": [("border-collapse", "collapse")]},
    ]
)

Benchmark Results for ISOC7 Dataset Isomerization Energies
^* Reference method CCSD(T)/CBS, in kcal/mol
Model	MAE^*	RMSE^*
eSEN-S-Con-OMol	1.07	1.32
UMA-S-1.2-OMol	1.09	1.34
AIMNet2-wB97MD3	1.29	1.60
UMA-S-1.2-OMat	2.88	3.62
ANI-1ccx	3.08	5.17
AIMNet2-B973c	4.16	4.96
UMA-S-1.2-ODAC	4.92	6.23
ANI-1x	5.15	6.65
ANI-2x	5.75	7.29
M3GNet-UP-2022	7.82	10.09
MACE-MPA-0	13.46	15.47

From these results, we see that eSEN-S-Con-OMol, UMA-S-1.2-OMol and AIMNet2-wB97MD3 perform best, achieving the lowest errors (eSEN-S-Con-OMol having MAE = 1.07 kcal/mol, RMSE = 1.32 kcal/mol). We note that all of these are trained on datasets with ωB97M-D3(BJ) as the reference method. In a broader context, the accuracy of these models is therefore on a par with the best performing DFT functionals, at a fraction of the computational cost.

Plotting the correlation between reference and predicted energies shows the success of the model, with essentially no real outliers.

ax = jobs["eSEN-S-Con-OMol"].results.plot_simple_correlation("energy", validation_set=False)
ax;

../_images/benchmark_mlpot_isoc7_48_0_21dedec5.png

Other models targeted at small molecules also perform well, including ANI-1ccx and AIMNet2-B973c, with MAE < 5 kcal/mol.

The correlation plot shows, however, that in this case there are some structures which are not well captured by the model:

ax = jobs["ANI-1ccx"].results.plot_simple_correlation("energy", validation_set=False)
ax;

../_images/benchmark_mlpot_isoc7_50_0_28907fb9.png

We can extract these outlier structures from the data and inspect them.

As noted in the original paper, we find many of the systems with a high deviations compared to the reference values contain a highly strained ring systems or tetrahedrane cores, which is also a common issue with many of the tested DFT functionals.

print("\n".join(str(jobs["ANI-1ccx"].results.get_data_set_evaluator()).split("\n")[:9]))

Group/Expression            N        MAE          RMSE*        Unit*          Weight     Loss*        Contribution[%]
---------------------------------------------------------------------------------------------------------------------
Total                       1297     3.07973      +5.17159     kcal/mol       1297.000   8672162574.204 100.00     Total

   energy                   1297     3.07973      +5.17159     kcal/mol       1297.000   8672162574.204 100.00     Extractor
      C7H10_5-C7H10_396     1        31.12503     -31.12503    kcal/mol       1.000      242191863.432 2.79       Expression
      C7H10_8-C7H10_396     1        30.82095     -30.82095    kcal/mol       1.000      237482804.138 2.74       Expression
      C7H10_6-C7H10_396     1        26.50819     -26.50819    kcal/mol       1.000      175671019.698 2.03       Expression
      C6H6_10-C6H6_56       1        25.57246     +25.57246    kcal/mol       1.000      163487616.549 1.89       Expression

imgs = {
    k: plams.view(molecules[k], width=300, height=300, guess_bonds=True)
    for k in ["C7H10_5", "C7H10_8", "C7H10_6", "C6H6_10"]
}

plams.plot_image_grid(imgs, rows=1);

../_images/benchmark_mlpot_isoc7_53_0_bb895ec9.png

Finally, we find that, as expected, the worst performers in this benchmark are models trained primarily on materials data such as M3GNet-UP-2022 and MACE-MPA-0, which are better suited to other chemical domains.

Python Script¶

#!/usr/bin/env python
# coding: utf-8

# ## Benchmarking Hydrocarbon Isomerization Energies

# This benchmark aims to evaluate the accuracy of a range of machine learning potentials included in AMS, using the ISOC7 dataset from the paper: A.Karton and E.Semidalas, “Benchmarking Isomerization Energies for C5–C7 Hydrocarbons: The ISOC7 Database,” J. of Comp. Chem. 47(2) (2026): e70296, https://doi.org/10.1002/jcc.70296.
#
# The ISOC7 dataset contains a diverse range of hydrocarbons from simple alkanes to highly strained systems featuring tetrahedrane units, as well as alkanes, alkenes, alkynes, and cyclic compounds. It has reference data to CCSD(T)/CBS level of accuracy and therefore provides a good benchmark for evaluating the accuracy of ML potentials for small organic systems.

# ### Extract Reference ISOC7 Database

# We first extract the ISOC7 database:

import os
import tarfile
from pathlib import Path

isoc7_tar = Path(os.environ["AMSHOME"]) / "atomicdata" / "Benchmarks" / "ISOC7.tar.gz"
isoc7 = Path("./ISOC7")

if not isoc7.exists():
    with tarfile.open(isoc7_tar, "r:gz") as tar:
        tar.extractall(path=Path.cwd())


# ### Load XYZ Data

# Then we load all the xyz files as PLAMS molecules:

import scm.plams as plams

molecules = plams.read_molecules(isoc7, formats=["xyz"])


# We can see the number of loaded structures:

print(f"Number of loaded structures: {len(molecules)}")


# And view a selection of them, with their corresponding unique id:

imgs = {k: plams.view(v, width=300, height=300, guess_bonds=True) for k, v in list(molecules.items())[:4]}

plams.plot_image_grid(imgs, rows=1, save_path="picture1.png")
# ### Load Reference Data

# Next we load the reference CCSD(T)/CBS energies from the benchmark:

import pandas as pd

reference_data = pd.read_csv(
    isoc7 / "ISOC7_W1-F12_Ref_Values.csv",
    usecols=[
        "reaction_name",
        "species1",
        "species2",
        "W1-F12_isomerization_energy (CCSD(T)/CBS, kcal/mol)",
    ],
)

reference_data


# ### Set up Data Sets with ParAMS

# To run our benchmark, we are going to user ParAMS. First we set up the JobCollection. This contains the information on the input structures.

import scm.params as params

results_importer = params.ResultsImporter(settings={"units": {"energy": "kcal/mol"}})


for k, v in molecules.items():
    jce = params.JCEntry()
    jce.settings.input.AMS.Task = "SinglePoint"
    jce.molecule = v
    jce.metadata["Source"] = "https://doi.org/10.1002/jcc.70296"
    results_importer.job_collection.add_entry(k, jce)


print(results_importer.job_collection.from_jobids(["C5H10_1"]))


# Next we set up the dataset containing the information on each isomerization reaction and the reference energy:

ha_to_kcal_mol = plams.Units.conversion_ratio("Ha", "kcal/mol")

for _, (s1, s2, ref_e) in reference_data[
    ["species1", "species2", "W1-F12_isomerization_energy (CCSD(T)/CBS, kcal/mol)"]
].iterrows():
    results_importer.data_set.add_entry(
        f"energy('{s2}')-energy('{s1}')",
        reference=ref_e,
        unit=("kcal/mol", ha_to_kcal_mol),
        metadata={
            "Source": "https://doi.org/10.1002/jcc.70296",
            "reference_method": "CCSD(T)/CBS",
        },
    )


print(results_importer.data_set.from_jobids(["C5H6_1", "C5H6_20"]))


# We can also visualize one of these isomerization reactions, for example the first one with a high isomerization energy:

imgs = {k: plams.view(molecules[k], width=300, height=300, guess_bonds=True) for k in ["C5H6_20", "C5H6_1"]}

plams.plot_image_grid(imgs, rows=1, save_path="picture2.png")
# Now we store the yaml files needed to run the ParAMS calculations:

results_importer.store("./isoc7_benchmark")


# Note that the dataset can also be viewed using the ParAMS GUI:

# !params -gui ./isoc7_benchmark/job_collection.yaml


# ### Run Benchmarks with ParAMS

# Now we set up a benchmark job for each ML model of interest, and run them. We include all models suitable for small organic molecules, as well as some trained on materials/other domains for comparison.

models = [
    "AIMNet2-B973c",
    "AIMNet2-wB97MD3",
    "ANI-1ccx",
    "ANI-1x",
    "ANI-2x",
    "eSEN-S-Con-OMol",
    "M3GNet-UP-2022",
    "MACE-MPA-0",
    "UMA-S-1.2-ODAC",
    "UMA-S-1.2-OMat",
    "UMA-S-1.2-OMol",
]


jobs = {}

for model in models:
    job = params.ParAMSJob.from_yaml("./isoc7_benchmark", name=f"benchmark_{model}")
    job.settings.runscript.nproc = 1
    job.settings.runscript.preamble_lines = ["export OMP_NUM_THREADS=1"]
    job.settings.input.parallellevels.jobs = 1
    job.settings.input.mlpotential.model = model
    job.settings.input.task = "SinglePoint"
    jobs[model] = job


for job in jobs.values():
    job.run()
    job.ok()


# Each completed job now has results for the errors between the model calculations and the reference. For example, with AIMNet2-wB97MD3:

print("\n".join(str(jobs["AIMNet2-wB97MD3"].results.get_data_set_evaluator()).split("\n")[:20]))


# Results can alternatively be displayed in a correlation plot of the reference vs. predicted energies:

ax = jobs["AIMNet2-wB97MD3"].results.plot_simple_correlation("energy", validation_set=False)
ax
ax.figure.savefig("picture3.png")


# Note that all these can also again be viewed in the ParAMS GUI:

# !params -gui ./plams_workdir/benchmark_AIMNet2-wB97MD3/results


# ## Evaluating Benchmark Results

# To evaluate the results, we extract the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) from the job results:

results = pd.DataFrame({"Model": models})


results["MAE"] = [jobs[m].results.get_data_set_evaluator().results.mae for m in models]
results["RMSE"] = [jobs[m].results.get_data_set_evaluator().results.rmse for m in models]


results.sort_values(by="MAE", inplace=True)


results.rename(
    columns={
        "MAE": "MAE<sup>*</sup>",
        "RMSE": "RMSE<sup>*</sup>",
    },
    inplace=True,
)


results.style.hide(axis="index").background_gradient(
    subset=["MAE<sup>*</sup>", "RMSE<sup>*</sup>"], cmap="Blues", vmin=0, vmax=10
).format("{:.2f}", subset=["MAE<sup>*</sup>", "RMSE<sup>*</sup>"]).set_caption(
    "<br><span style='text-align:left;font-size:18px'>Benchmark Results for <a href='https://doi.org/10.1002/jcc.70296' target='_blank'>ISOC7 Dataset</a> Isomerization Energies</span>"
    "<br><span style='font-size:14px'><sup>*</sup> Reference method CCSD(T)/CBS, in kcal/mol</span>"
).set_table_styles(
    [
        {"selector": "th", "props": [("border-bottom", "2px solid black")]},
        {"selector": "th, td", "props": [("min-width", "100px")]},
        {"selector": "td", "props": [("border-bottom", "1px solid lightgray")]},
        {"selector": "table", "props": [("border-collapse", "collapse")]},
    ]
)


# From these results, we see that `eSEN-S-Con-OMol`, `UMA-S-1.2-OMol` and `AIMNet2-wB97MD3` perform best, achieving the lowest errors (`eSEN-S-Con-OMol` having MAE = 1.07 kcal/mol, RMSE = 1.32 kcal/mol). We note that all of these are trained on datasets with ωB97M-D3(BJ) as the reference method. In a broader context, the accuracy of these models is therefore on a par with the best performing DFT functionals, at a fraction of the computational cost.
#
# Plotting the correlation between reference and predicted energies shows the success of the model, with essentially no real outliers.

ax = jobs["eSEN-S-Con-OMol"].results.plot_simple_correlation("energy", validation_set=False)
ax
ax.figure.savefig("picture4.png")


# Other models targeted at small molecules also perform well, including `ANI-1ccx` and `AIMNet2-B973c`, with MAE < 5 kcal/mol.
#
# The correlation plot shows, however, that in this case there are some structures which are not well captured by the model:

ax = jobs["ANI-1ccx"].results.plot_simple_correlation("energy", validation_set=False)
ax
ax.figure.savefig("picture5.png")


# We can extract these outlier structures from the data and inspect them.
#
# As noted in the original paper, we find many of the systems with a high deviations compared to the reference values contain a highly strained ring systems or tetrahedrane cores, which is also a common issue with many of the tested DFT functionals.

print("\n".join(str(jobs["ANI-1ccx"].results.get_data_set_evaluator()).split("\n")[:9]))


imgs = {
    k: plams.view(molecules[k], width=300, height=300, guess_bonds=True)
    for k in ["C7H10_5", "C7H10_8", "C7H10_6", "C6H6_10"]
}

plams.plot_image_grid(imgs, rows=1, save_path="picture6.png")


# Finally, we find that, as expected, the worst performers in this benchmark are models trained primarily on materials data such as `M3GNet-UP-2022` and `MACE-MPA-0`, which are better suited to other chemical domains.

Benchmarking ML Potentials for Hydrocarbon Isomerization Energies¶

Benchmarking Hydrocarbon Isomerization Energies¶

Extract Reference ISOC7 Database¶

Load XYZ Data¶

Load Reference Data¶

Set up Data Sets with ParAMS¶

Run Benchmarks with ParAMS¶

Evaluating Benchmark Results¶

See also¶

Related examples¶

Related tutorials¶

Related documentation¶

Python Script¶