To start we need to get the codes and install the package. To start, bast to have a clean virtual environment. Assuming you are using conda you can install these codes as:
conda create -n ccs python=3.8
conda activate ccs
git clone git@github.com:Jammyzx1/Carbon-capture-fingerprint-generation.git
cd Carbon-capture-fingerprint-generation
pip install .
When you have installed the codes you can use the CCS representation directly using the code in the next section. This will produce a 72 element representation.
import ccsfp.informatics.finger_prints as fpsmiles = ["c1ccccc1CN", "c1ccncc1", "c1ccccc1CNC", "c1ccccc1CCN(C)C"]
fps_smi, df_fps_smi, smarts_smi = fp.ccs_fp(smiles)
The output is three parts:
- fps_smi: A list of ExplicitBitVect objects
- df_fps_smi: A pandas dataframe of the fingerprints each row is for one of the molecules
- smarts_smi: A list of smarts sub-structures which are searched for and hence define the elements which are present and absent
We can use the same code to define our own arbitrary structural fingerprint for your own applications:
fingerprint_substructures = [
"*c1ccccc1*",
"*[CX3][CX3]*",
"*[CX3][CX3][CX3]*",
"*[CX3][CX3][CX3][CX3]*"
"c1ccccc1CN",
]substructure_names = ["phenyl",
"ethyl",
"propyl",
"butyl",
"smiles_example",
]
fps, df_fps, smarts = fp.ccs_fp(smiles,
substructures=fingerprint_substructures,
substructure_names=substructure_names
)
df_fps.index = smiles
Here we define the fingerprint using SMARTS and SMILES as sub-structures and provide a name for each. This provides the following data frame of features based on the fingerprint structures.
This provides us with a method to generate bespoke structural fingerprints through python codes. Further examples of what the codes are able to do can be found here.
Here we have shown that we can generate bespoke chemical structure based fingerprints as representations for machine learning. You can see examples of this in some of my own work Chemical Space Analysis and Property Prediction for Carbon Capture Amine Molecules and Machine Guided Discovery of Novel Carbon Capture Solvents.
Hopefully this something that you can make use of to generate interpretable representations of chemical systems for a variety of properties.
As usual this article comes with the usual disclaimer which can be found at the following link https://medium.com/@james.l.mcdonagh/about