PubChem Part 2: Searching with Structures
This section reviews finding chemical information in PubChem with chemical structures. It includes:
- The basics of searching with line notations and drawings
- Identity searching
- Finding similar structures, substructures, and superstructures
To begin, please click the Next button below.
Search with Structures: Line Notations
1 of 3
You can search PubChem with chemical structures to find exact matches or chemicals that share a similar structure or substructure.
PubChem recognizes structure drawings and line notations, including:
- SMILES identifier (Simplified Molecular-Input Line-Entry System)
- SMARTS identifier (SMILES Arbitrary Target Specification)
- InChl (IUPAC International Chemical Identifier)
Identifier | Common Abbreviation | Malonic Acid Example |
---|---|---|
Simplified Molecular-Input Line-Entry System | SMILES | C(C(=O)O)C(=O)O |
SMILES Arbitrary Target Specification | SMARTS | [#6++](-[#6](=[#8+4])-[#8+5])-[#6](=[#8+4])-[#8+5] |
International Union of Pure and Applied Chemistry (IUPAC) International Chemical Identifiers | InChI | InChI=1S/C3H4O4/c4-2(5)1-3(6)7/h1H2,(H,4,5)(H,6,7) |
You should be viewing the PubChem homepage in your other browser Window. If you are not, open the PubChem homepage before moving on.
Search with Structures: Line Notations
2 of 3
Enter the SMILES identifier for calcium carbonate into the PubChem search box (hint: copy and paste):
C(=O)([O-])[O-].[Ca+2]
PubChem automatically recognizes that you’re searching for a structure and will try to identify the Compound. Relevant results are listed under the Identity tab.
For this structure, you should see one result for calcium carbonate with Compound CID 10112.
Search with Structures: Line Notations
3 of 3
Exercise
Let's try a couple of other searches using line notations.
Which amino acid has the structure CSCC[C@@H](C(=O)O)N (hint: copy and paste)?
Incorrect!
Copy and paste the structure directly into PubChem. Look for the Compound listed on the ‘Identity’ tab.
Incorrect!
This is the molecular weight. Look for the Compound name.
Correct!
That is correct!
Incorrect!
Copy and paste the structure directly into PubChem. Look for the Compound listed on the Identity tab.
Which element is represented by InChI=1S/Cu (hint: copy and paste)?
Correct!
That is correct!
Incorrect!
Look at the Compound listed on the Identity tab.
Incorrect!
Copy and paste the structure directly into PubChem. Look for the Compound listed on the Identity tab.
Incorrect!
Copy and paste the structure directly into PubChem. Make sure to include the InChI= part.
Search with Structures: Drawings
1 of 8
You can also search by manually drawing a structure using the PubChem Sketcher.
Return to the PubChem homepage by clicking on either the PubChem logo or the Search PubChem button in the upper right-hand corner of the screen.
Scroll below the search box and click on the Draw Structure icon.
The PubChem Sketcher should open in a new window.
This tutorial will walk you through some basic PubChem Sketcher functions and mechanics.
Search with Structures: Drawings
2 of 8
The PubChem Sketcher window has three parts:
- The left side of the window has buttons and controls for drawing.
- The right side of the window is where the structure drawing will appear.
- The top of the window has a search bar that will display a drawn structure as a line notation or other identifier.
Search with Structures: Drawings
3 of 8
There are two main ways to draw a structure:
- Input an identifier, like SMILES or SMARTS, into the search bar, or
- Use the buttons to manually draw a structure.
Let’s try using the InChl for benzaldehyde to draw a structure.
Search with Structures: Drawings
4 of 8
In the PubChem Sketcher window, change the SMILES drop-down menu to StdInChl.
Copy and paste the InChl into the search bar:
InChI=1S/C7H6O/c8-6-7-4-2-1-3-5-7/h1-6H
Click Enter on the keyboard to search.
Search with Structures: Drawings
5 of 8
The structure should appear in the white box below the search bar:
To search PubChem with this structure, click the blue Search for This Structure button at the bottom of the Sketcher window. Do this now.
Search with Structures: Drawings
6 of 8
You should now see the PubChem results page. In the search bar, the SMILES string for the query structure is shown. You should see one result for Benzaldehyde under the Identity tab.
Return to the PubChem homepage and click Draw Structure to return to the PubChem Sketcher window.
Search with Structures: Drawings
7 of 8
This time, let’s try drawing the structure for benzaldehyde in the PubChem Sketcher.
This exercise will introduce you to the basics of using the PubChem Sketcher.
Search with Structures: Drawings
8 of 8
Follow the steps below to duplicate this benzaldehyde structure drawing using the PubChem Sketcher:
1. Select the benzene ring button. It will turn yellow when selected.
2. Move the mouse to the right side of the window and click in the middle of the white space. The ring will appear when you click.
3. Select the propane symbol button. It will turn yellow when selected.
4. Click on the top of the benzene ring to place the propane symbol.
5. Select the mirror button. It will turn yellow when selected.
6. With the mirror button selected, click anywhere on the structure. It will flip the structure to match the direction of our drawing.
7. Select the double bond button. It will turn yellow when selected.
8. Click on the propane symbol to add the double bond.
9. Select the oxygen button. It will turn yellow when selected.
10. Click the top of the double bond symbol to add the oxygen symbol.
You've now created the structure for benzaldehyde.
Click the blue Search for This Structure button.
The PubChem results page should return one matching Compound, Benzaldehyde.
Search with Structures: Identity Search
What we just did with the benzaldehyde query is an identity search. An identity search returns compounds identical to the query molecule. When using identity search, you have some control over what is meant by "identical" compounds.
The default search considers two molecules to be identical if they have the same connectivity, isotopism, and stereochemistry. You can tell PubChem to ignore isotopism or stereochemistry by clicking the Settings button on the top-right of the search results and selecting an appropriate one from the list of available options.
When stereochemistry is ignored, compounds with the same connectivity and isotopism, but with varying stereochemistry, are returned. If isotopism is ignored, the identity search finds compounds with the same connectivity and stereochemistry, but with different isotopes.
To view more information on the identity search with different definitions of chemical identity to find stereoisomers and isotopomers, view this article.
Find Similar Structures, Substructures, and Superstructures
1 of 4
Once you’ve identified the structure you’re looking for, you can use links on the results page to jump to other structure searches.
Return to the PubChem homepage by clicking on either the PubChem logo or the Search PubChem button in the upper right-hand corner of the screen.
Search for the SMILES identifier:
CC(C)OP(=O)(C)F
PubChem should identify this as Sarin.
The search results displayed below the search box has five tabs: Identity, Similarity, Substructure, Superstructure, and 3D Similarity.
- Similarity - allows one to locate Compounds that are similar to a chemical structure query using pre-specified similarity thresholds.
- Substructure – allows one to locate chemical structures that contain a particular connectivity and valence-bond pattern.
- Superstructure – allows one to identify chemical structures that comprise or make up the provided chemical structure query.
Find Similar Structures, Substructures, and Superstructures
2 of 4
PubChem has automatically run those searches and the results for individual search types can be viewed by clicking the corresponding tabs.
For example, click on the Similarity tab option. PubChem now displays the Similarity results.
Find Similar Structures, Substructures, and Superstructures
3 of 4
Right below the Similarity tab is a description of the search: Fingerprint Tanimoto-based 2-dimensional similarity search.
If you want to see results for 3D similarity, go back to the results page and click on the 3D Similarity tab.
Find Similar Structures, Substructures, and Superstructures
4 of 4
For all of these structure searches, you can adjust some parameters to perform customized searches based on what you need. Click on the blue Settings button to see which customizations are available.
View step-by-step directions for different customized structure searches at the links below:
- Finding Drug-Like Compounds Similar to a Query Compound Through 2-D Similarity Search
- Finding Compounds Similar to a Query Compound Through 3-D Similarity Search
- Getting the Bioactivity Data for the Hit Compounds From Substructure Search
Conclusion
This concludes Part 2.
Close the NLM Navigator windows and continue the PubChem Tutorial.