PubChem has many different categories for data that you can search for and access. This section will briefly introduce each data type to prepare you to search PubChem.
Data sources submit data about a chemical; that data becomes a Substance record in PubChem. These can include chemical structures, synonyms, registration IDs, descriptions, related URLs, patent identifiers, database cross-references to PubMed, protein 3D structures, and biological screening results.
Every time a data source submits new information about a chemical, a new Substance record is generated. Substance summaries help you see who provided what.
This is an example of a Substance record for the chemical Semaglutide. The source of this Substance in PubChem is the chemical vendor ChemShuttle.
The Compound summary is an aggregated view of all available information in PubChem about a chemical.
This is an example of a Compound summary for Semaglutide. The summary includes information about Semaglutide from multiple different sources; for example, it has Chemical Classes information from Drugs@FDA and the European Medicines Agency (EMA).
Read more about the difference between a Substance and a Compound record in the PubChem Documentation.
When a data source submits to PubChem the description of biological assay experiments and bioactivity test results on substances, each experiment becomes a BioAssay record.
Here is an example of a BioAssay record in PubChem titled "Displacement of [125I]-GLP1 from human GLP1 receptor expressed in BHK cells after 2 hrs in absence of human serum albumin." The source of this BioAssay record is ChEMBL database from the European Bioinformatics Institute.
Read more about BioAssays in the PubChem Documentation.
PubChem Protein and Gene records include chemical information available for a given protein or gene, including bioactivity data of chemicals that are tested against the corresponding protein or gene. PubChem has genes and proteins for different taxons. Gene and Protein records can include information from different sources.
This is an example of a Protein record for HLA class II histocompatibility antigen, DRB1 beta chain (human).
Here is an example of a Gene record for HLA-DRB1 – major histocompatibility complex, class II, DR beta 1 (human).
Learn more about genes and proteins in the PubChem Documentation.
PubChem Pathway summaries include information about chemicals, genes, or diseases involved in or associated with a biological pathway. The NIH National Human Genome Research Institute defines a biological pathway as "a series of actions among molecules in a cell that leads to a certain product or a change in the cell. It can trigger the assembly of new molecules, such as a fat or protein, turn genes on and off, or spur a cell to move."
This is an example of an Ibuprofen Metabolism Pathway summary from the academic source PathBank.
Learn more about Pathways in the PubChem Documentation.
A Cell Line summary presents PubChem data associated with a given cell line. Cells in a cell line are often used in scientific research. Cell line information in PubChem can come from a variety of sources.
This is an example of a Cell summary for MCF-7.
Learn more about cell lines in the PubChem Documentation.
Taxonomy summaries in PubChem display data associated with a specific organism, like a human or Norway rat. Taxonomy summaries can include information from multiple sources.
This is an example of a Taxonomy record for Bactrocera oleae (olive fruit fly).
Learn more about taxonomies in the PubChem Documentation.
The PubChem Patent collection contains information on what chemicals are mentioned in a given patent document.
Here is an example of a Patent record titled "Process for purifying semaglutide and liraglutide."
Learn more about patents in the PubChem Documentation.
Answer these questions to check your understanding of the data types in PubChem:
The table below summarizes PubChem Data Types:
Date Type | Description |
Substance | Submitted data about a chemical from a source |
Compound | An aggregated view of all available information in PubChem about a chemical |
BioAssays | Description of biological assay experiments and bioactivity test results on substances |
Targets: Proteins and Genes | Chemical information available for a given protein or gene (or protein encoded by the gene) |
Pathways | Information about chemicals, genes, or diseases involved in or associated with a biological pathway |
Cell Lines | Chemical information associated with a given cell line |
Taxonomy | Chemical information associated with a specific organism |
Patents | Chemical information mentioned in a given patent |