Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Exercise 2: Searching with PubChem Sketcher

A button to download the slides in PDF format for this section

Background

What if you don't know the name of a chemical, but have a structure and you would like to identify it or learn more about it?  Being able to search with that structure can be very helpful, for example:
    • An Analytical Chemist would like to identify a chemical isolated from a bioactive isolate. 
    • A Medicinal Chemist has sketched out what she thinks might be an interesting analog of a known drug and wants to see if she can find a vendor to purchase it.
    • An Organic Chemistry research student's experiment has gone awry, but he's identified the chemical structure and would like to know what he's made.
    • A Forensic Chemist has identified the structure of a novel compound found in a poisoning victim by a Medical Examiner and needs to know what it is.


Searching with PubChem Sketcher

PubChem has a way for you to manually draw a structure and use it to search for identical or similar structures. PubChem Sketcher (similar to other chemical drawing tools, such as ChemDraw and ChemSketch) enables you to create a 2D structure to search those PubChem Compound structures and find their record pages. To use PubChem Sketcher for this purpose:
  • Go to the PubChem homepage . You can search by structure from the start by clicking on Draw Structure (bottom left icon)
  • Manually draw a structure of interest or inputting SMILES, SMARTS, InChI, and InChiKey information
  • Click the "Search For The Structure" button
    Learn more here
PubChem Compound records are created and the information aggregated and added based on standardized chemical structures. So if you search with a structure, you may be able to find a single PubChem Compound record with everything we know about it which, again, can be quite a lot.


Searching with PubChem Sketcher- Salicylic acid Example

  1. Start your search on the PubChem Homepage
  2. Type "salicylic acid" into the input and click on the Best Match result
  3. Click Find Similar Structures and Edit Structure to make changes to the chemical sketch
  4. Replace the phenol with a carboxyl manually. To create an acetyl group where the "O" atom is attached to the benzene ring:
    • Click the button in the 4th row down and three buttons over to select it (it'll turn orange) and then click the "O" to add the group
    • Click the double bond in the 2nd row down and two buttons over to select and click on the middle of the downward pointing bond to convert it to a double bond
    • Click the "O" in the modified periodic table in the middle of the button panel, and click at the bottom of that new double bond to replace the carbon with a "O"
    • Click the pull-down menu next to :Hydrogens" and select "Add special" and then click the "Hydrogens" button to add the hydroxyl's "H"
4 step example shown

Alternative approach to accessing the salicylic acid structure, you can replace steps 2-4 above with:

2. Click on the "Draw Structure" icon below the text search box to activate the PubChem Sketcher window

Note: You use the panel on the left to draw a structures from scratch OR you can type/paste in a SMILES, SMARTS or InChI string in the text box at the top to pre-seed your structure

3. The SMILES string for Salicylic acid is: O(C1=CC=CC=C1C(O[H])=O)[H] , copy/paste this into the text box at the top of the EDIT STRUCTURE window and hit the "return/enter" key on your keyboard.  You should see the carbon backbone of the structure along with "special atoms" - in this case "O" for oxygens

4. Replace the SMILES string for Salicylic acid with O(C1=CC=CC=C1C(O[H])=O)C(C)=O


  1. Click the "Search for this structure" button to retrieve the record for the identical hit - it should look familiar to you!
  2. The PubChem Sketcher also has an export function which includes the ability to create images:
    sketcher export options shown
 

Take home learning:

Why else do people do structure searches?

As suggested in exercise 1, researchers may be interested in identifying similar structures to learn more about these structurally-related chemicals or they may want create a dataset to serve as a combinatorial library of chemical structural analogs. This is often the first step for:

  • Assessing their potential for further study in computational pharmacological analysis
  • Ordering sets of compounds for doing biochemical or cellular bioactivity studies>
  • Creating assays to differentiate and identify key compounds in environmental samples

How does structure searching work?

PubChem has a binary fingerprint dictionary of 81 different chemical structure fragments. PubChem Compound structures are mapped onto this dictionary to create for each a fingerprint of chemical substructure "keys". Each key denotes the presence or absence of a particular substructure in a molecule. (Please note:  The fingerprint does not consider variation in stereochemical or isotopic information.) Collectively, these binary keys provide a "fingerprint" of a particular chemical structure valence-bond form.

When a structure search is initiated the query structure's fingerprint is compared to each of PubChem Compound structure fingerprints and a similarity value is calculated from the Tanimoto (or Jaccard) equation. Each compound pair receives a Tanimoto (or Jaccard) score. In PubChem, records are retrieved based on a Tanimoto threshold. A threshold of "100%" effectively is an "exact match" to the provided chemical structure query (ignoring stereo or isotopic information) and often retrieves a single record that matches the query structure, however - due to the potential for stereochemical or isotopic information loss in the calculation - a few isomers may be retrieved. To retrieve similar structure sets, the default setting of 90% retrieves a reasonable set of strongly similar structures. However, you can adjust the threshold as low as 60% in the settings menu.

Please note that you can do a few different types of structure searching:

  • Identity:  Tanimoto Threshold of 100%

  • Similar:  retrieves structures that are 90% similar by default, however this is adjustable in the "Settings" menu>

  • Substructure:  retrieves chemicals that have your structure (aspirin) as the base structure - with other atoms/groups attached

  • Superstructure:  retrieves chemical groups that make up portions of the structure

  • 3D Similarity:  retrieves structures that are similar in 3-dimensional space (learn more)

Learn more about Structure Searching



Take-away Message

  • The PubChem Sketcher has many capabilities that make it a convenient tool for searching by structure, altering structure, and saving chemical structure images 
  • PubChem Sketcher and PubChem Compound were key resources for this exercise. Review PubChem sponsored documentation, including this PubChem Sketcher Help Document for more information

Last Reviewed: November 12, 2022