Exercise 2: Searching with PubChem Sketcher
Background
What if you don't know the name of a chemical, but have a structure and you would like to identify it or learn more about it? Being able to search with that structure can be very helpful, for example:-
- An Analytical Chemist would like to identify a chemical isolated from a bioactive isolate.
- A Medicinal Chemist has sketched out what she thinks might be an interesting analog of a known drug and wants to see if she can find a vendor to purchase it.
- An Organic Chemistry research student's experiment has gone awry, but he's identified the chemical structure and would like to know what he's made.
- A Forensic Chemist has identified the structure of a novel compound found in a poisoning victim by a Medical Examiner and needs to know what it is.
Searching with PubChem Sketcher
PubChem has a way for you to manually draw a structure and use it to search for identical or similar structures. PubChem Sketcher (similar to other chemical drawing tools, such as ChemDraw and ChemSketch) enables you to create a 2D structure to search those PubChem Compound structures and find their record pages. To use PubChem Sketcher for this purpose:- Go to the PubChem homepage . You can search by structure from the start by clicking on Draw Structure (bottom left icon)
- Manually draw a structure of interest or inputting SMILES, SMARTS, InChI, and InChiKey information
- Click the "Search For The Structure" button
Learn more here
Searching with PubChem Sketcher- Salicylic acid Example
- Start your search on the PubChem Homepage
- Type "salicylic acid" into the input and click on the Best Match result
- Click Find Similar Structures and Edit Structure to make changes to the chemical sketch
- Replace the phenol with a carboxyl manually. To create an acetyl group where the "O" atom is attached to the benzene ring:
- Click the button in the 4th row down and three buttons over to select it (it'll turn orange) and then click the "O" to add the group
- Click the double bond in the 2nd row down and two buttons over to select and click on the middle of the downward pointing bond to convert it to a double bond
- Click the "O" in the modified periodic table in the middle of the button panel, and click at the bottom of that new double bond to replace the carbon with a "O"
- Click the pull-down menu next to :Hydrogens" and select "Add special" and then click the "Hydrogens" button to add the hydroxyl's "H"
Alternative approach to accessing the salicylic acid structure, you can replace steps 2-4 above with: 2. Click on the "Draw Structure" icon below the text search box to activate the PubChem Sketcher window Note: You use the panel on the left to draw a structures from scratch OR you can type/paste in a SMILES, SMARTS or InChI string in the text box at the top to pre-seed your structure 3. The SMILES string for Salicylic acid is: O(C1=CC=CC=C1C(O[H])=O)[H] , copy/paste this into the text box at the top of the EDIT STRUCTURE window and hit the "return/enter" key on your keyboard. You should see the carbon backbone of the structure along with "special atoms" - in this case "O" for oxygens |
- Click the "Search for this structure" button to retrieve the record for the identical hit - it should look familiar to you!
- The PubChem Sketcher also has an export function which includes the ability to create images:
Take home learning:
Why else do people do structure searches?
As suggested in exercise 1, researchers may be interested in identifying similar structures to learn more about these structurally-related chemicals or they may want create a dataset to serve as a combinatorial library of chemical structural analogs. This is often the first step for:
- Assessing their potential for further study in computational pharmacological analysis
- Ordering sets of compounds for doing biochemical or cellular bioactivity studies>
- Creating assays to differentiate and identify key compounds in environmental samples
How does structure searching work?
PubChem has a binary fingerprint dictionary of 81 different chemical structure fragments. PubChem Compound structures are mapped onto this dictionary to create for each a fingerprint of chemical substructure "keys". Each key denotes the presence or absence of a particular substructure in a molecule. (Please note: The fingerprint does not consider variation in stereochemical or isotopic information.) Collectively, these binary keys provide a "fingerprint" of a particular chemical structure valence-bond form.
When a structure search is initiated the query structure's fingerprint is compared to each of PubChem Compound structure fingerprints and a similarity value is calculated from the Tanimoto (or Jaccard) equation. Each compound pair receives a Tanimoto (or Jaccard) score. In PubChem, records are retrieved based on a Tanimoto threshold. A threshold of "100%" effectively is an "exact match" to the provided chemical structure query (ignoring stereo or isotopic information) and often retrieves a single record that matches the query structure, however - due to the potential for stereochemical or isotopic information loss in the calculation - a few isomers may be retrieved. To retrieve similar structure sets, the default setting of 90% retrieves a reasonable set of strongly similar structures. However, you can adjust the threshold as low as 60% in the settings menu.
Please note that you can do a few different types of structure searching:
-
Identity: Tanimoto Threshold of 100%
-
Similar: retrieves structures that are 90% similar by default, however this is adjustable in the "Settings" menu>
-
Substructure: retrieves chemicals that have your structure (aspirin) as the base structure - with other atoms/groups attached
-
Superstructure: retrieves chemical groups that make up portions of the structure
-
3D Similarity: retrieves structures that are similar in 3-dimensional space (learn more)
Learn more about Structure Searching
Take-away Message
- The PubChem Sketcher has many capabilities that make it a convenient tool for searching by structure, altering structure, and saving chemical structure images
- PubChem Sketcher and PubChem Compound were key resources for this exercise. Review PubChem sponsored documentation, including this PubChem Sketcher Help Document for more information
Last Reviewed: November 12, 2022