Database¶
The submodule contain several class for providing an interface to a sql database for managing COSKF files and physical properties.
- class pyCRS.Database.COSKFDatabase(path: str)¶
A class provide an interface to a sql database containing the following tables.
Table name
Description
Compound
Unique compounds with COSKF files by CAS number or identifier.
Conformer
Multiple conformers with corresponding COSKF files.
PhysicalProperty
User-defined physical properties.
PropPred
Estimated properties using QSPR methods from SMILES.
- Parameters:
path (
str
) – Path to the database file. Created if it doesn’t exist.
Example:
db = COSKFDatabase("my_coskf_db.db") db.add_compound("Water.coskf") db.add_compound("Benzene.coskf",cas="71-43-2") db.add_physical_property("Benzene", "meltingpoint", 278.7) db.add_physical_property("Benzene", "hfusion", 9.91, unit="kJ/mol") db.estimate_physical_property("Benzene")
- add_compound(coskf_file: str, name: str | None = None, cas: str | None = None, identifier: str | None = None, coskf_path: str | None = None, smiles: str | None = None, nring: int | None = None, ignore_smiles_check: bool = False, ignore_duplicates: bool = False)¶
Adds a new .coskf file to the database.
- Parameters:
coskf_file (
str
) – a path to the .coskf file, or alternatively, the file name of the .coskf file if thecoskf_path
is provided.- Keyword Arguments:
name (
str, optional
) – Compound name. Default to IUPAC name, CAS number, identifier, or .coskf file name if not specified. Can be set via keyword argument or read from the .coskf file.cas (
str, optional
) – CAS number. If not provided, it will attempt to use the value from the .coskf file if available.identifier (
str, optional
) – Chemical identifier of the compound.coskf_path (
str, optional
) – Directory containing the .coskf file. Defaults to ADFCRS-2018 database path.smiles (
str, optional
) – SMILES string. Defaults to the value in the .coskf file if available.nring (
int, optional
) – Numbr of ring atoms. Defaults to the value from the .coskf file.ignore_smiles_check (
bool, optional
) – If True, skips identity check via SMILES generation. Defaults to Fasle.ignore_duplicates (
bool, optional
) – If True, skips duplicate recognition using UniqueConformersCrest in AMSConformer tool. Default to False.
Note
Each compound must have an unique CAS number or identifier.
During add_compound, CAS and identifier are checked for uniqueness in the database.
An error is raised if multiple compounds share the same CAS number and identifier.
The example below is invalid because both compounds use the same identifier, CRS0001.
db.add_compound("Benzene.coskf",cas="71-43-2",identifier="CRS0001") db.add_compound("Ethanol.coskf",cas="64-17-5",identifier="CRS0001")
- add_physical_property(identifier: str, attribute: str, value: float | str, unit: str | None = None)¶
Add a value of a physical property to the PhysicalProperty TABLE in the database using compound’s identifier
- Parameters:
identifier (
str
) – CAS number, identifier or compound name.attribute (
str
) – Name of the physical property (eg. meltingpoint or hfusion).value (
float or str
) – Value of the physical property.unit (
str, optional
) – the unit of the input value. The default units are K, kcal/mol and kcal/mol-K. The following units are accepted and will be automatically converted to the default units: - Temperature: K, C - Enthalpy: kcal/mol, kJ/mol, cal/g, J/g - Heat capacity: kcal/mol-K, kJ/mol-K, cal/g-K, J/g-K - pvap: bar, atm, Pa, mmHg
Note
The vp_equation accepts only parameters for pressure in bar and temperature in Kelvin.
db.add_physical_property("Benzene", "meltingpoint", 278.7) db.add_physical_property("Benzene", "hfusion", 9.91, unit="kJ/mol") db.add_physical_property("Benzene", "vp_equation", "Antoine") db.add_physical_property("Benzene", "vp_params", "4.72583, 1660.652, -1.461") db.add_physical_property("Benzene", "flashpoint", -11.63, unit="C") #Vapor pressure at 353.25K is 1.01325 bar db.add_physical_property("Benzene", "tvap", 353.25) db.add_physical_property("Benzene", "pvap", 1.01325)
- clear_physical_property(identifier: str | List[str] | None = None, attribute: str | List[str] | None = None)¶
Clears the value of a physical property in PhysicalProperty TABLE in the database by compound’s identifier
- Parameters:
identifier (
str or List[str], optional
) – CAS number, chemical identifier or compound name as a string or a list of strings. If None, all compound are selected.attribute (
str or List[str], optional
) – Specific property to clear as a string or a list of strings. If None, all properties are cleared.
db.clear_physical_property(["water", "benzene"])
- del_row(dbrow: CompoundRow | Dict[str, List[CompoundRow]])¶
Remove a compound from the database and delete the corresponding .coskf file.
- Parameters:
dbrow (
CompoundRow or Dict[str, List[CompoundRow]]
) – the row to remove from the database
- del_row_by_conformer_id(conformer_id: int)¶
Remove the conformer from the database.
- Parameters:
conformer_id (
int
) – A integer of intergers representing the conformer in the CONFORMER TABLE.
db.del_row_by_conformer_id(1)
- del_rows(dbrows: List[CompoundRow] | Dict[str, List[CompoundRow]])¶
Remove multiple compounds from the database and delete the corresponding .coskf files.
- Parameters:
dbrows (
List[CompoundRow] or Dict[str, List[CompoundRow]]]
) – the rows to remove from the database.
db.del_rows(db.get_compounds('benzene'))
- estimate_physical_property(identifier: str | List[str] | None = None, compound_id: int | List[int] | None = None)¶
Estimate the physical properties using the property prediction tool and add the values to the PropPred TABLE in the database
- Keyword Arguments:
identifier (
str or List[str], optioanl
) – CAS number, chemical identifier or compound name as a string or a list of strings.compound_id (
int or List[int], optional
) – an integer or a list representing the compound ID(s).
Note
The QSPR descriptor used in the property prediction tool is determined from the SMILES string. The selection priority of SMILES is as follows: (1) User-provided SMILES via the
add_compound()
method. (2) SMILES read from the .coskf file. (3) SMILES generated by OpenBabel using the compound’s coordinates in the .coskf file. Please note that the automatically resolved SMILES may be incorrect for some molecules, for instance when bond orders cannot be automatically determined and species with charges.db.estimate_physical_property("Benzene")
- get_all_compounds() List[CompoundRow] ¶
Retrive all compounds in the database
- Returns:
The full list of CompoundRow instances in the database
- Return type:
List[CompoundRow]
- get_all_conformers() List[ConformerRow] ¶
Retrives all conformers in the database
- Returns:
The full list of ConformerRow instances in the database.
- Return type:
List[ConformerRow]
- get_all_physical_properties(source: str = 'PhysicalProperty') List[PhysicalPropertyRow] | List[PropPredRow] ¶
Retrive all physical properties in the database
- Parameters:
source (
str, optional
) – Source of the properties. - ‘PhysicalProperty’ (default): Returns properties from the PhysicalProperty table. - ‘PropPred’: Returns estimated properties from the PropPred table.- Returns:
A list of PhysicalPropertyRow instances or PropPredRow instances in the database.
- Return type:
List(PhysicalPropertyRow) or List(PropPredRow)
- get_attribute_by_compound_id(attributes: str | List[str], compound_id: int | List[int], source: str | List[str] | None = None)¶
Retrieve the list of values for compounds with specified compound_id(s) in the database
- Parameters:
attributes (
str or List[str]
) – Attribute(s) to be retrieved.compound_id (
int or List[int]
) – A integer or a list of intergers used to search for compounds in the COMPOUND TABLE.source (
str or List[str], optional
) – The table used in the search. Default is COMPOUND TABLE and PhysicalProperty TABLE
- Returns:
A list of tuples containing the values of the specified attributes for the compounds.
- Return type:
list of attributes
db.get_attribute_by_compound_id("name", 1) db.get_attribute_by_compound_id(["name", "cas", "hfusion"] 1) db.get_attribute_by_compound_id(["name", "hfusion"], 1, source=["COMPOUND","PropPred"])
- get_compounds(identifier: str | List[str]) Dict[str, CompoundRow] ¶
Retrieves compounds from the COMPOUND TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters:
identifier (
str or List[str]
) – CAS number, chemical identifier or compound name as a string or a list of strings.- Returns:
A dictionary where each key is an input identifier and its corresponding value is the CompoundRow instances.
- Return type:
Dict[str, CompoundRow]
- get_compounds_id(identifier: str | List[str]) List[int | None] ¶
Retrieves compound id from the COMPOUND TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters:
identifier (
str or List[str]
) – CAS number, chemical identifier or compound name as a string or a list of strings.- Returns:
A list of compound IDs corresponding to the input identifier. If a name is not found, None is returned at the corresponding position.
- Return type:
List[Optional[int]]
- get_conformers(identifier: str | List[str]) Dict[str, ConformerRow] ¶
Retrieves conformers from the CONFORMER TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters:
identifier (
str or list
) – CAS number, chemical identifier or compound name as a string or a list of strings.- Returns:
A list of ConformerRow instances that match the search criteria.
- Return type:
Dict[str, ConformerRow]
- get_physical_properties(identifier: str | List[str] | None = None, compound_id: int | List[int] | None = None, source: str = 'PhysicalProperty') List[PhysicalPropertyRow] | List[PropPredRow] ¶
Retrive physical properties in the database by matching CAS number, chemical identifier, name or compound id.
- Parameters:
identifier (
str or List[str], optional
) – CAS number, chemical identifier or compound name as a string or a list of strings. If None,compound_id
must be provided.compound_id (
int or List[int], optional
) – Compound ID as an integer or a list of integers. If None,identifier
must be provided.source (
str, optional
) – Source of the properties. - ‘PhysicalProperty’ (default): Returns properties from the PhysicalProperty table. - ‘PropPred’: Returns estimated properties from the PropPred table.
- Returns:
A list of PhysicalPropertyRow or PropPredRow instances, depending on the source.
- Return type:
List[PhysicalPropertyRow] or List[PropPredRow]
- modify_attribute_by_compound_id(attribute: str, value: str | int, compound_id: int)¶
Modifies the value of a specified attribute for a given compound ID.
- Parameters:
attribute (
str
) – Attribute to modify. It can be one of the following: ‘name’, ‘cas’, ‘identifier’, ‘smiles’, ‘nring’.value (
str or int
) – the new value of the specified attribute.compound_id (
int
) – an integer representing the compound ID.
db.modify_attribute_by_compound_id("identifier","InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H", 0)
- update_compound_by_conformer_id(compound_id: int, conformer_id: int)¶
Update the data for a compound ID row in the COMPOUND TABLE using the data from a conformer ID row in the CONFORMER TABLE.
- Parameters:
compound_id (
int
) – A integer representing compound id corresponding to a specific row in the COMPOUND TABLE of the databaseconformer_id (
int
) – A integer representing conformer id corresponding to a specific row in the CONFORMER TABLE of the database
- update_compound_by_lowestE(compound_id: int | List[int] | None = None)¶
Update the data for a compound ID row in the COMPOUND TABLE using the data from a conformer ID row with the lowest energy having the same compound ID in the CONFORMER TABLE.
- Keyword Arguments:
compound_id (
int or List[int], optional
) – Compound ID as an integer or a list of integers. If None, updates all compounds in the database.
- visualize_conformers(compound_id: int | None = None, identifier: str | None = None)¶
Visualize conformers in ascending order of conformers IDs.
- Parameters:
compound_id (
int, optional
) – Compound ID for which conformers are visualized.identifier (
str, optional
) – CAS number, chemical identifier or compound name.
- class pyCRS.Database.CompoundRow(compound_id: int, conformer_id: int, name: str, cas: str, identifier: str, smiles: str, resolved_smiles: str, coskf: str, Egas: float, Ecosmo: float, nring: int)¶
A data class to represent the contents of a row in a COMPOUND TABLE in
COSKFDatabase
- compound_id¶
A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type:
int
- conformer_id¶
A unique identifer for a specific row in the CONFORMER TABLE of the database
- Type:
int
- name¶
The name associated with the row in the COMPOUND TABLE
- Type:
str
- cas¶
The CAS number associated with the row, i.e., the compound
- Type:
str
- identifier¶
The chemical identifier associated with the row, i.e., the compound
- Type:
str
- smiles¶
The SMILES string provided by user
- Type:
str
- resolved_smiles¶
The derived SMILES string obtained using OpenBabel from the coordinates in the COSKF file.
- Type:
str
- coskf¶
The filename of the
.coskf
file stored in the localSCM_PYCRS_COSKF_DB
directory- Type:
str
- Egas¶
The gas phase bond energy rounded to 3 decimal places in kcal/mol
- Type:
float
- Ecosmo¶
The bond energy in a perfect conductor rounded to 3 decimal places in kcal/mol
- Type:
float
- nring¶
The number of ring atoms
- Type:
int
- db_path¶
The path to the
.coskf
file directory- Type:
str
- get_full_coskf_path()¶
Returns the full path of the corresponding
.coskf
file
- read_coskf()¶
Opens the
.coskf
file corresponding to the database entry and returns a scm.plams.KFFile instance
- class pyCRS.Database.ConformerRow(conformer_id: int, compound_id: int, name: str, cas: str, identifier: str, smiles: str, resolved_smiles: str, coskf: str, Egas: float, Ecosmo: float, nring: int)¶
A data class to represent the contents of a row in a CONFORMER TABLE in
COSKFDatabase
- conformer_id¶
A unique identifer for a specific row in the CONFORMER TABLE of the database
- Type:
int
- compound_id¶
A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type:
int
- name¶
The name associated with the row in the CONFORMER TABLE
- Type:
str
- cas¶
The CAS number associated with the row, i.e., the compound
- Type:
str
- identifier¶
The chemical identifier associated with the row, i.e., the compound
- Type:
str
- smiles¶
The SMILES string provided by user
- Type:
str
- resolved_smiles¶
The derived SMILES string obtained using OpenBabel from the coordinates in the COSKF file
- Type:
str
- coskf¶
The filename of the
.coskf
file stored in the localSCM_PYCRS_COSKF_DB
directory- Type:
str
- Egas¶
The gas phase bond energy rounded to 3 decimal places in kcal/mol
- Type:
float
- Ecosmo¶
The bond energy in a perfect conductor rounded to 3 decimal places in kcal/mol
- Type:
float
- nring¶
The number of ring atoms
- Type:
int
- db_path¶
The path to the
.coskf
file directory- Type:
str
- get_full_coskf_path()¶
Returns the full path of the corresponding
.coskf
file
- read_coskf()¶
Opens the
.coskf
file corresponding to the database entry and returns a scm.plams.KFFile instance
- class pyCRS.Database.PhysicalPropertyRow(compound_id: int, meltingpoint: float, hfusion: float, cpfusion: float, boilingpoint: float, density: float, flashpoint: float, dielectricconstant: float, vp_equation: str, vp_params: str, tvap: float, pvap: float, Mn: float)¶
A data class to represent the contents of a row in a PhysicalProperty TABLE in
COSKFDatabase
- compound_id¶
A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type:
int
- meltingpoint¶
melting temperature (K)
- Type:
float
- hfusion¶
enthalpy of husion (kcal/mol)
- Type:
float
- cpfusion¶
heat capacity of fusion (kcal/mol-K) calculated as the difference between the heat capacity in the liquid state and the heat capacity in the solid state.
- Type:
float
- boilingpoint¶
boiling pointK (K)
- Type:
float
- density¶
liquid density (kg/L)
- Type:
float
- flashpoint¶
flash point (K)
- Type:
float
- dielectricconstant¶
dielectric constant
- Type:
flash
- vp_equation¶
The vapor pressure equation to use. Unit in bar. Options include: ANTOINE, VPM1 and DIPPR101
- Type:
str
- vp_params¶
Parameters for the vp_equation, expressed as “A, B, C, D, E”
- Type:
str
- tvap¶
Temperature(K) at pvap
- Type:
float
- pvap¶
Pressure(bar) at tvap
- Type:
float
- Mn¶
polymer average molecular weight (g/mol)
- Type:
float
- Vapor Pressure Equations:
- ANTOINE:
log10(P) = A - B/(C+T)
- DIPPR101:
ln(P) = A + B/T + C*ln(T) + D*T**E
- VPM1:
ln(P) = A/T + B*ln(T) + C*T + D
- class pyCRS.Database.PropPredRow(compound_id: int, adopt_smiles: str, meltingpoint: float, hfusion: float, boilingpoint: float, density: float, flashpoint: float, dielectricconstant: float, vp_equation: str, vp_params: str)¶
A data class to represent the contents of a row in a PropPred TABLE in
COSKFDatabase
- compound_id¶
A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type:
int
- adopt_smiles¶
The SMILES used for QSPR method
- Type:
str
- meltingpoint¶
melting temperature (K)
- Type:
float
- hfusion¶
enthalpy of husion (kcal/mol)
- Type:
float
- boilingpoint¶
boiling pointK (K)
- Type:
float
- density¶
liquid density (kg/L)
- Type:
float
- flashpoint¶
flash point (K)
- Type:
float
- dielectricconstant¶
dielectric constant
- Type:
flash
- vp_equation¶
The vapor pressure equation to use. Unit in bar. VPM1
- Type:
str
- vp_params¶
Parameters for the vp_equation, expressed as “A, B, C, D, E”
- Type:
str