FastSigma: a QSPR method to estimate COSMO sigma-profiles

Introduction

The traditional workflow for performing a COSMO-RS/-SAC calculation first involves a very expensive DFT geometry optimization and single point calculation in the COSMO phase to generate \(\sigma\)-profiles and other necessary parameters for COSMO-RS/-SAC. Once these parameters are known, the COSMO-RS/-SAC calculation can be performed extremely efficiently, often in only a matter of milliseconds. This imbalance of computational expense means there is a significant opportunity in circumventing the expensive DFT steps.

The FastSigma program reads a molecule in several possible formats (SMILES, .mol, .sdf) and estimates all of the properties required for a COSMO-RS/-SAC calculation: the HB-/Non-HB-/OT-/OH- \(\sigma\)-profiles, COSMO surface area, and COSMO volume as well as bond energies that can be important for vapor phase or multispecies calculations. This code incorporates two distinct methods that are able to estimate these important COSMO-RS/-SAC properties. The first uses QSPR techniques similar to those applied in our Property Prediction program which shares the same accepted atom types. The second uses a database of \(\sigma\)-profiles and a custom molecular graph hashing algorithm to build \(\sigma\)-profiles for query molecules using a set of the \(\sigma\)-profiles from molecules in the database containing similar substructures.

Both of these techniques are extremely efficient and are capable of providing estimates for these essential COSMO-RS/-SAC properties in milliseconds. This allows for quick thermodynamic calculations to be done for a new molecule of interest as well as drastically expedites searches through screening databases of molecular candidates as compared to the traditional, full-fledged COSMO-RS/-SAC workflow.

Important

To use the SG1 method, users will have to first download the Subgraph Sigma Profile Estimation (SG1) Database (molsg_sg1db) using the AMS Package Manager.

Note

pyCRS can be used for python scripting with FastSigma. Several python examples are given in the pyCRS documentation.

Input options

A list of the input options and examples of their usage is given below.

Flag

Purpose

Example

-h [–help]

Produces help message

$AMSBIN/fast_sigma –help

-s [–smiles]

Input molecule as SMILES sting

$AMSBIN/fast_sigma –smiles <SMILES> …

-m [–mol]

Input molecule as .mol file

$AMSBIN/fast_sigma –mol <mol file> …

–sdf

Input molecule as an .sdf file

$AMSBIN/fast_sigma –sdf <file.sdf> …

–model

Choose from 2 possible techniques

$AMSBIN/fast_sigma –model FS1 …

-d [–display]

Display problem results

$AMSBIN/fast_sigma -d …

-o [–output]

Write output to file

$AMSBIN/fast_sigma –o <output.compkf> …

–method

Chose a COSMO-RS/-SAC method

$AMSBIN/fast_sigma –method COSMO-RS …

--model FS1

The FS1 model is a QSPR model. It currently has two supported methods: COSMO-RS and COSMOSAC2016. One of these method names must be entered after the –method flag. The default method is COSMO-RS.

--model SG1

The SG1 model is based on substructure hashing and database searching. It currently has two supported methods: COSMO-RS and COSMOSAC2013.

Note

This model can take a few seconds to load the required database. If the user would like to use this method to estimate multiple compounds, it is recommended to use pyCRS. With pyCRS, the database will only be loaded during the first calculation and then stay in memory.

-o <output.compkf>

The fast sigma program writes the output results to a file in .compkf format. The chosen output filename should generally end with .compkf. This suffix helps other parts of the code (COSMO-RS/-SAC/-UNIFAC/Solvent Optimization) recognize the format and use the file accordingly. If no filename is supplied the program writes to a file called CRSKF.compkf.

-s <SMILES_string or .mol file>

Though COSMO-RS/-SAC can make estimates for many types of molecular species, the fast sigma program currently only supports organic, neutral, closed shell molecules.

GUI Input

The simplest way to use the Fast Sigma program is through the COSMO-RS GUI. There are two ways to do this:

  • SMILES string: Compounds → List of Compounds → Add Compound using FastSigma → SMILES and select Add.

  • .xyz file: Compounds → List of Compounds → Add Compound using FastSigma → .xyz, and select Add.

A .compkf file will be saved that can be used as input in COSMO-RS calculations.

Examples

This example calculates COSMO-RS (the default) parameters for phenol:

$AMSBIN/fast_sigma --smiles "c1ccccc1(O)" -d
[show/hide output]
       sigma value       Total profile          HB profile
            -0.025               0.000               0.000
            -0.024               0.000               0.000
            -0.023               0.000               0.000
            -0.022               0.002               0.002
            -0.021               0.054               0.054
            -0.020               0.263               0.263
            -0.019               0.523               0.523
            -0.018               0.684               0.684
            -0.017               0.828               0.828
            -0.016               0.801               0.801
            -0.015               0.732               0.716
            -0.014               0.642               0.597
            -0.013               0.653               0.519
            -0.012               0.678               0.487
            -0.011               0.607               0.423
            -0.010               0.567               0.382
            -0.009               0.646               0.245
            -0.008               4.183               0.023
            -0.007               7.405               0.000
            -0.006               7.912               0.000
            -0.005               6.701               0.000
            -0.004               5.544               0.000
            -0.003               4.658               0.000
            -0.002               3.899               0.000
            -0.001               4.097               0.000
             0.000               6.109               0.000
             0.001               7.854               0.000
             0.002               8.640               0.000
             0.003               9.726               0.000
             0.004              11.175               0.000
             0.005              12.524               0.000
             0.006               8.673               0.000
             0.007               2.255               0.000
             0.008               1.174               0.161
             0.009               1.279               1.159
             0.010               1.442               1.442
             0.011               1.759               1.751
             0.012               1.795               1.788
             0.013               0.838               0.829
             0.014               0.095               0.093
             0.015               0.054               0.054
             0.016               0.030               0.030
             0.017               0.000               0.000
             0.018               0.000               0.000
             0.019               0.000               0.000
             0.020               0.000               0.000
             0.021               0.000               0.000
             0.022               0.000               0.000
             0.023               0.000               0.000
             0.024               0.000               0.000
             0.025               0.000               0.000
       Molecular Mass =        94.0418648120 g/mol
           COSMO Area =       127.5012207186 Angstrom**2
         COSMO Volume =       122.0791950835 Angstrom**3
Gas Phase Bond Energy =        -2.9875007647 Hartree
          Bond Energy =        -2.9968155744 Hartree
           Dispersion =        -4.5319123638 kcal/mol
           Deltaediel =         0.0000000000 Hartree
                Nring =         6
     Chemical Formula =         C6H6O
               SMILES =         c1ccccc1(O)

Additionally, we calculate the COSMOSAC2016 parameters for Ibuprofen as a mol file:

$AMSBIN/fast_sigma --mol Ibuprofen.mol --method COSMOSAC2016 -d
[show/hide output]
       sigma value       Total profile          OH profile          OT profile
            -0.025               0.000               0.000               0.000
            -0.024               0.000               0.000               0.000
            -0.023               0.000               0.000               0.000
            -0.022               0.000               0.000               0.000
            -0.021               0.009               0.009               0.000
            -0.020               0.062               0.061               0.000
            -0.019               0.395               0.385               0.000
            -0.018               0.914               0.881               0.000
            -0.017               0.925               0.879               0.000
            -0.016               0.840               0.781               0.000
            -0.015               0.652               0.590               0.000
            -0.014               0.697               0.606               0.000
            -0.013               0.604               0.499               0.000
            -0.012               0.561               0.398               0.000
            -0.011               0.725               0.418               0.000
            -0.010               0.833               0.350               0.000
            -0.009               1.282               0.230               0.000
            -0.008               2.141               0.158               0.000
            -0.007               5.133               0.085               0.000
            -0.006              10.428               0.048               0.000
            -0.005              14.386               0.000               0.000
            -0.004              23.816               0.000               0.000
            -0.003              26.081               0.000               0.000
            -0.002              23.295               0.000               0.000
            -0.001              21.443               0.000               0.000
             0.000              22.124               0.000               0.000
             0.001              20.652               0.000               0.000
             0.002              24.315               0.036               0.000
             0.003              15.722               0.086               0.035
             0.004              11.878               0.171               0.092
             0.005              13.670               0.288               0.197
             0.006              10.405               0.381               0.307
             0.007               5.479               0.561               0.413
             0.008               3.525               0.713               0.613
             0.009               3.358               0.823               1.055
             0.010               3.879               0.639               1.840
             0.011               4.503               0.180               3.025
             0.012               2.708               0.083               2.006
             0.013               0.930               0.020               0.745
             0.014               0.061               0.000               0.104
             0.015               0.000               0.000               0.000
             0.016               0.000               0.000               0.000
             0.017               0.000               0.000               0.000
             0.018               0.000               0.000               0.000
             0.019               0.000               0.000               0.000
             0.020               0.000               0.000               0.000
             0.021               0.000               0.000               0.000
             0.022               0.000               0.000               0.000
             0.023               0.000               0.000               0.000
             0.024               0.000               0.000               0.000
             0.025               0.000               0.000               0.000
       Molecular Mass =       206.1306798160 g/mol
           COSMO Area =       278.4276940312 Angstrom**2
         COSMO Volume =       279.3341044098 Angstrom**3
Gas Phase Bond Energy =        -7.1463537624 Hartree
          Bond Energy =        -7.1619486814 Hartree
           Dispersion =        -9.7153055452 kcal/mol
           Deltaediel =         0.0007518662 Hartree
                Nring =         6
     Chemical Formula =         C13H18O2
               SMILES =         CC(C)Cc1ccc(C(C)C(=O)O)cc1

We can also use the SG1 model for phenol.

$AMSBIN/fast_sigma --smiles "c1ccccc1(O)" --model SG1 -d
[show/hide output]
       sigma value       Total profile          HB profile
            -0.025               0.000               0.000
            -0.024               0.000               0.000
            -0.023               0.000               0.000
            -0.022               0.003               0.003
            -0.021               0.067               0.067
            -0.020               0.434               0.434
            -0.019               0.878               0.878
            -0.018               0.995               0.995
            -0.017               0.996               0.996
            -0.016               0.942               0.940
            -0.015               0.771               0.766
            -0.014               0.684               0.635
            -0.013               0.610               0.549
            -0.012               0.693               0.486
            -0.011               0.671               0.397
            -0.010               0.755               0.350
            -0.009               1.344               0.255
            -0.008               4.312               0.026
            -0.007               7.751               0.000
            -0.006               7.855               0.000
            -0.005               6.819               0.000
            -0.004               6.226               0.000
            -0.003               5.612               0.000
            -0.002               4.654               0.000
            -0.001               4.679               0.000
             0.000               4.969               0.000
             0.001               5.814               0.000
             0.002               7.672               0.000
             0.003              10.711               0.000
             0.004              12.231               0.000
             0.005              12.061               0.000
             0.006               8.394               0.000
             0.007               3.355               0.000
             0.008               1.677               0.153
             0.009               1.434               1.226
             0.010               1.566               1.566
             0.011               1.972               1.972
             0.012               2.133               2.133
             0.013               0.966               0.966
             0.014               0.062               0.062
             0.015               0.000               0.000
             0.016               0.000               0.000
             0.017               0.000               0.000
             0.018               0.000               0.000
             0.019               0.000               0.000
             0.020               0.000               0.000
             0.021               0.000               0.000
             0.022               0.000               0.000
             0.023               0.000               0.000
             0.024               0.000               0.000
             0.025               0.000               0.000
       Molecular Mass =        94.0418648120 g/mol
           COSMO Area =       133.1606910587 Angstrom**2
         COSMO Volume =       122.0268006780 Angstrom**3
Gas Phase Bond Energy =        -2.9830476046 Hartree
          Bond Energy =        -2.9928087890 Hartree
           Dispersion =         0.0000000000 kcal/mol
           Deltaediel =         0.0000000000 Hartree
                Nring =         6
     Chemical Formula =         C6H6O
               SMILES =         c1ccccc1(O)

The warning message will be displayed if a molecule contains atoms or substructures that are not listed in the accepted atom types table. For example, in the compound C1=CC=[Ge]C=C1, the atom ‘Ge’ is not available in the QSPR method. As a result, the property prediction tool will yield incorrect sigma profile.

$AMSBIN/fast_sigma --smiles "C1=CC=[Ge]C=C1" -d
[show/hide output]
  WARNING: there are atoms and/or substructures in the molecule which cannot be estimated.
  This will affect the accuracy of the results.

  Atoms which cannot be estimated:
  Ge

       sigma value       Total profile          HB profile
            -0.025               0.000               0.000
            -0.024               0.000               0.000
            -0.023               0.000               0.000
            -0.022               0.000               0.000
            -0.021               0.000               0.000
            -0.020               0.000               0.000
            -0.019               0.000               0.000
            -0.018               0.000               0.000
            -0.017               0.000               0.000
            -0.016               0.000               0.000
            -0.015               0.000               0.000
            -0.014               0.000               0.000
            -0.013               0.000               0.000
            -0.012               0.000               0.000
            -0.011               0.000               0.000
            -0.010               0.000               0.000
            -0.009               0.000               0.000
            -0.008               0.896               0.000
            -0.007               2.280               0.000
            -0.006               5.170               0.000
            -0.005               9.078               0.000
            -0.004               9.044               0.000
            -0.003               4.854               0.000
            -0.002               4.211               0.000
            -0.001               4.505               0.000
            -0.000               4.415               0.000
             0.001               4.824               0.000
             0.002               4.750               0.000
             0.003               5.745               0.000
             0.004               3.006               0.000
             0.005               4.904               0.000
             0.006               5.411               0.000
             0.007               4.222               0.000
             0.008               2.623               0.000
             0.009               0.000               0.000
             0.010               0.000               0.000
             0.011               0.000               0.000
             0.012               0.000               0.000
             0.013               0.000               0.000
             0.014               0.000               0.000
             0.015               0.000               0.000
             0.016               0.000               0.000
             0.017               0.000               0.000
             0.018               0.000               0.000
             0.019               0.000               0.000
             0.020               0.000               0.000
             0.021               0.000               0.000
             0.022               0.000               0.000
             0.023               0.000               0.000
             0.024               0.000               0.000
             0.025               0.000               0.000
       Molecular Mass =       138.9603029600 g/mol
           COSMO Area =        79.9378826526 Angstrom**2
         COSMO Volume =        88.0443110798 Angstrom**3
Gas Phase Bond Energy =        -2.2538026599 Hartree
          Bond Energy =        -2.2571102789 Hartree
           Dispersion =        -2.9625031363 kcal/mol
           Deltaediel =         0.0000000000 Hartree
                Nring =         6
     Chemical Formula =         C5H5Ge
               SMILES =         C1=CC=[Ge]C=C1