PubChem Part 1: Basic Searching
Part 1 of this section will review how to find chemical information in PubChem starting with chemical names or identifiers, molecular formulas, gene symbols, proteins, pathways, taxons, cell lines, and patent numbers. Part 2 will review structure searching.
To begin, please click the Next button below.
Search with Chemical Names or Identifiers
1 of 5
PubChem recognizes chemical names, many synonyms, and other identifiers like the International Union of Pure and Applied Chemistry (IUPAC) name and Chemical Abstract Services (CAS) number.
Search With: | Example: |
---|---|
Chemical name | Neratinib |
Synonyms | Nerlynx |
HKI-272 | |
Other identifiers | International Union of Pure and Applied Chemistry (IUPAC): (E)-N-[4-[3-chloro-4-(pyridin-2-ylmethoxy)anilino]-3-cyano-7-ethoxyquinolin-6-yl]-4-(dimethylamino)but-2-enamide |
Chemical Abstract Service (CAS): 698387-09-6 |
Type citric acid into the search box on the PubChem homepage. As you type, PubChem will start to auto-fill a list of potential Compound, Gene, and Taxonomy names.
You can either click on the Compound you’re looking for from the autofill list, or click Enter on the keyboard to search.
Note on phrase search results in PubChem
Click Enter now to search for citric acid.
Search with Chemical Names or Identifiers
2 of 5
On the search results page, the first result at the very top of the page is the Best Match, which is the result that PubChem suggests is most relevant to your search.
To see how many total Compound and Substance records PubChem has for citric acid, scroll below the Best Match search to the menu with tabs including Compounds, Substances, Genes, Pathways, and other data type categories.
The number indicates how many records PubChem has for each data type.
Search with Chemical Names or Identifiers
3 of 5
Scroll back up to the Best Match box on the results page. It includes identifiers for citric acid, including the Compound CID, molecular formula (MF), and molecular weight (MW).
The Compound ID (CID) is the Compound’s unique PubChem identifier. The Compound CID for citric acid is 311. To return to this Compound directly, you can search PubChem with the Compound CID.
Search with Chemical Names or Identifiers
4 of 5
Exercise
Search PubChem to answer the questions below:
What is the Compound CID for Apixaban?
Incorrect!
503612-47-3 is Apixaban’s CAS number. Look for the PubChem CID.
Correct!
That is correct!
Incorrect!
Eliquis is a synonym for Apixaban. Look for the PubChem CID.
Incorrect!
BMS-562247 is a different identifier for Apixaban. Look for the PubChem CID.
Which Compound has the CID of 2244?
Correct!
That is correct!
Incorrect!
2244 is the Substance ID number for Cytidylate. Look for the Compound that is the Best Match when you search for 2244.
Incorrect!
2244 is the BioAssay AID number for "A Cell Based Secondary Assay to Explore Cytotoxicity of West Nile Virus Anti-Viral Synthesized/Analog Compounds." Look for the Compound that is the Best Match when you search for 2244.
Incorrect!
Check that you’re searching the correct ID. Look for the Best Match result that appears at the top of the results with this CID.
Search with Chemical Names or Identifiers
5 of 5
Open the PubChem record for Aspirin (CID 2244) by clicking on the record title (the link labeled Aspirin; ACETYLSALICYLIC ACID; 50-78…).
PubChem pages have a few features that help you navigate to the information you need:
- A Contents menu with sections and subsections that organize different types of data, like BioAssay results and gene or protein targets.
- Tooltips that describe what each section and subsection contain. Look for the question mark icon beside a section’s title and click on it to reveal details about that section.
On the Compound summary page for Aspirin (CID 2244), find the Contents menu located on the right-hand side of the page or below the summary box. This menu displays the different sections of available information in PubChem.
Find the Biological Test Results section and click on the small arrow beside it to view its sub-sections. This section has one sub-section for aspirin, BioAssay Results.
Click on BioAssay Results to view available bioactivity information in PubChem about Aspirin.
Search with Molecular Formula
1 of 2
You can also search with a chemical’s molecular formula.
Return to the PubChem homepage by clicking on either the PubChem logo or the Search PubChem button in the upper right-hand corner of the screen.
Search PubChem for C7H6O3 (hint: copy and paste the number and press enter).
Notice that the results page looks different this time. Below the search box is a note that PubChem is "Treating this as a molecular formula query." This means PubChem automatically detects that you’re searching with a molecular formula and will retrieve results that match the formula.
Salicylic acid is the top result. The molecular formula is listed in the "MF" field on the results page.
Search with Molecular Formula
2 of 2
Exercise
Search PubChem to answer the questions below:
How many Compounds does PubChem retrieve when you search C18H40O4P2?
Incorrect!
Copy and paste the formula directly into the PubChem search.
Incorrect!
Copy and paste the formula directly into the PubChem search.
Incorrect!
Copy and paste the formula directly into the PubChem search.
Correct!
That is correct!
What is the molecular formula for SynuClean-D?
Incorrect!
This is the Compound CID for SynuClean-D. Look for the molecular formula.
Correct!
That is correct!
Incorrect!
This is the molecular weight of SynuClean-D. Look for the molecular formula.
Incorrect!
This is a different identifier for SynuClean-D. Look for the molecular formula.
Search with Genes or Proteins
1 of 4
Return to the PubChem homepage by clicking on either the PubChem logo or the Search PubChem button in the upper right-hand corner of the screen.
If a gene or protein target has been tested in a BioAssay or is involved in a Pathway, then it will have a record in PubChem.
For example, if you want to find information about the Vitamin D Receptor gene, search PubChem for Vitamin D Receptor. As you type, PubChem will start to auto-fill a list of potential Gene names. You can either click on the Gene you’re looking for from the auto-fill list, or click Enter on the keyboard to search.
Click on vitamin D receptor under Gene to continue.
Search with Genes or Proteins
2 of 4
Below the search box are the different PubChem data types with a count of how many summaries are available for each one.
If it’s not already selected, click on the Genes tab to view your options.
Search with Genes or Proteins
3 of 4
Now you should see results for summaries about the Vitamin D Receptor. Each result includes the organism it corresponds to in parentheses at the end of the title. For example, one result is for vdrb – vitamin D receptor b (zebrafish).
In the list, locate the summary for VDR – vitamin D receptor (human).
You can already see a lot of information about the gene on the results page, including how many linked BioAssays and linked Pathways it has in PubChem. These numbers appear in blue boxes with the summary on the results page. Clicking on those numbers will take you directly to a list of related BioAssay and Pathway PubChem summaries.
Click on VDR – vitamin D receptor (human) to view the full record. Use the Contents menu on the right-hand side of the screen or below the summary box to jump to different parts of the summary.
Search with Genes or Proteins
4 of 4
Exercise
Locate the Gene page for TBX2 – T-box transcription factor 2 (human) to answer the following questions:
Which protein target does PubChem list as being mapped to the TBX2 – T-box transcription factor 2 (human) gene target? (Hint: Locate the Proteins section of the Contents menu and look for Protein Targets)
Incorrect!
This is the Gene ID. Look for the Protein Targets section.
Incorrect!
This is the Gene symbol. Look for the Protein Targets section
Correct!
That is correct!
Incorrect!
This is the number of the Protein Targets section. Look for the Protein Target number.
Which Pathway is associated with this gene? (Hint: Look for the Interactions and Pathways section of the Contents menu)
Correct!
That is correct!
Incorrect!
Look for the Pathways section.
Incorrect!
Look for the Pathways section.
Incorrect!
Look for the Pathways section.
Search with Pathways
1 of 4
If you need to know which chemicals or genes interact with a specific biological pathway, you can find that in PubChem.
Return to the PubChem homepage by clicking on either the PubChem logo or the Search PubChem button in the upper right-hand corner of the screen.
Search PubChem for lidocaine metabolism.
Search with Pathways
2 of 4
On the results page, you’ll see at least three Pathway records. When new pathway data for any taxonomy is added to PubChem, a new Pathway record is created.
In the list for lidocaine metabolism, you see a result from the source Pathbank for the Homo sapiens (human) taxonomy. You also see two entries from the source WikiPathways, one for the Bos taurus (cattle) and one for the Homo sapiens (human) taxonomy. You can read more about how Pathway records are organized in PubChem in this article.
The results page also displays how many PubChem Compounds, Gene, and Protein records are linked to each Pathway. These are displayed as blue boxes with numbers.
Click on the record for Lidocaine (Local Anaesthetic) Metabolism Pathway from PathBank.
Search with Pathways
3 of 4
Use the Contents menu to jump to Interactions, Chemicals, Proteins, or Genes involved in this Pathway. Additionally, some Pathways will have linked Related Pathways.
Search with Pathways
4 of 4
Exercise
Locate the Pathway summary page for Glycolysis and Gluconeogenesis from the data source INOH for the taxonomy Homo sapiens (human) to answer the following questions:
This Pathway has Compound, Gene, and Protein records linked to it in PubChem. True or false?
Correct!
That is correct!
Incorrect!
On the results page, look for the Counts of different types of data linked to this Pathway. Which types are listed?
The chemical Phosphoric Acid is listed in PubChem as being involved with the Glycolysis and Gluconeogenesis Pathway. True or false?
Correct!
That is correct!
Incorrect!
Look for the Chemicals section on the Contents menu. Do you see Phosphoric Acid?
Search with Taxons
1 of 4
Taxonomy summaries include data available in PubChem associated with a specific organism. This includes biological experiments archived in PubChem BioAssay that were conducted against the organism as a whole or a particular gene or protein of it, as well as the compounds tested in those experiments.
Return to the PubChem homepage by clicking on either the PubChem logo or the Search PubChem button in the upper right-hand corner of the screen.
Search with Taxons
2 of 4
Search for malaria.
As you type, PubChem will start to auto-fill a list of potential Taxonomy names. You can either click on the Taxonomy you’re looking for from the auto-fill list, or click Enter on the keyboard to search.
Click Enter now to search for malaria.
Search with Taxons
3 of 4
On the search results page, select the Taxonomy tab to see Taxonomy results.
PubChem has multiple results for malaria, like Plasmodium falciparum (malaria parasite P. falciparum) and Plasmodium vivax (malaria parasite P. vivax). Each result may display a Linked BioAssay Count, Linked Proteins, and Linked Pathways with the number for each result connected to this organism. These are shown as blue boxes with numbers.
Click on Plasmodium falciparum (malaria parasite P. falciparum) to view the full summary.
Use the Contents menu to jump to related Chemicals and Bioactivities, BioAssays, and other information.
Search with Taxons
4 of 4
Exercise
Locate the PubChem record for the human papillomavirus 16 taxonomy to answer the following question:
What is the pathway protein associated with the human papillomavirus 16 taxonomy in PubChem?
Incorrect!
This is a Whole-Organism BioAssay. Look for the section about Pathway Proteins.
Incorrect!
This is a related taxonomy. Look for the section about Pathway Proteins.
Incorrect!
This is the MeSH Entry Term. Look for the section about Pathway Proteins.
Correct!
That is correct!
Search with Cell Lines
1 of 3
Cell summaries include PubChem data related to cell lines, such as compounds and bioassays tested again the cell line and drug sensitivity data.
Search PubChem for T-47D, which is a cell line used for breast cancer research.
Search with Cell Lines
2 of 3
The results page is showing Cell Lines records by default.
Select the record for T-47D.
Use the Contents menu to jump to related Chemicals and Bioactivites, BioAssays, and other information.
Search with Cell Lines
3 of 3
Exercise
Answer the following question about the T-47D cell line record in PubChem:
PubChem includes drug sensitivity data for the drug Tamoxifen and T-47D. True or false?
Correct!
That is correct.
Incorrect!
This is a related taxonomy. Use the Contents menu to locate the Drug Sensitivity section. Is Tamoxifen listed there?
Search with Patents
1 of 3
Patent summaries include PubChem data related to specific patents, like PubChem Compounds and PubChem Substances linked to a patent.
The simplest way to find a patent in PubChem is to search for its patent number.
Search PubChem for the patent number US-9072661-B2.
Search with Patents
2 of 3
This retrieves one result for a patent titled "Injectable ibuprofen formulation."
The summary result shows how many Linked Compounds and Linked Substances are linked to this patent. These are shown as blue boxes with numbers in them.
Click on the title to view the full record.
In the Contents menu, click on the Linked Chemicals section.
The Linked Chemicals section displays PubChem Compound and PubChem Substance records mentioned in the patent.
Search with Patents
3 of 3
Exercise
Locate the Patent summary in PubChem for the patent with number US-2022379724-A1 titled "Devices, methods and systems related to wearable patch having blood alcohol content detector" and answer the following question:
Which PubChem Compound is linked to this patent in PubChem?
Correct!
That is correct.
Incorrect!
Use the Contents menu to locate the Linked Chemical section. What chemicals are listed there?
Incorrect!
This is a Substance ID number linked to this patent. Look for the PubChem Compounds section.
Incorrect!
This is a Substance ID number linked to this patent. Look for the PubChem Compounds section.
Conclusion
This concludes Part 1. You now know how to find chemical information in PubChem starting with chemical names or identifiers, molecular formulas, gene symbols, proteins, pathways, taxons, cell lines, and patent numbers.
Close the NLM Navigator windows and continue to Part 2 of the PubChem Tutorial.