PubChem Training Course

About PubChem


Data Types


PubChem has many different categories for data that you can search for and access. This section will briefly introduce each data type to prepare you to search PubChem.

Substances

Data sources submit data about a chemical; that data becomes a Substance record in PubChem. These can include chemical structures, synonyms, registration IDs, descriptions, related URLs, patent identifiers, database cross-references to PubMed, protein 3D structures, and biological screening results.

Every time a data source submits new information about a chemical, a new Substance record is generated. Substance summaries help you see who provided what.

This is an example of a Substance record for the chemical Semaglutide. The source of this Substance in PubChem is the chemical vendor ChemShuttle.

PubChem Substance Record showing Semaglutide and ChemShuttle as the source.

Compounds

The Compound summary is an aggregated view of all available information in PubChem about a chemical.

This is an example of a Compound summary for Semaglutide. The summary includes information about Semaglutide from multiple different sources; for example, it has Chemical Classes information from Drugs@FDA and the European Medicines Agency (EMA).

PubChem Compound record for Semaglutide.
PubChem Compound record for Semaglutide showing Drugs@FDA and the European Medicines Agency circled under 3.3.1.

Read more about the difference between a Substance and a Compound record in the PubChem Documentation.

BioAssays

When a data source submits to PubChem the description of biological assay experiments and bioactivity test results on substances, each experiment becomes a BioAssay record.

Here is an example of a BioAssay record in PubChem titled "Displacement of [125I]-GLP1 from human GLP1 receptor expressed in BHK cells after 2 hrs in absence of human serum albumin." The source of this BioAssay record is ChEMBL database from the European Bioinformatics Institute.

PubChem BioAssay record for Displacement of [125I]-GLP1 from human GLP1 receptor expressed in BHK cells after 2 hrs in absence of human serum albumin.

Read more about BioAssays in the PubChem Documentation.

Targets: Genes and Proteins

PubChem Protein and Gene records include chemical information available for a given protein or gene, including bioactivity data of chemicals that are tested against the corresponding protein or gene. PubChem has genes and proteins for different taxons. Gene and Protein records can include information from different sources.

This is an example of a Protein record for HLA class II histocompatibility antigen, DRB1 beta chain (human).

PubChem Protein record for HLA class II histocompatibility antigen, DRB1 beta chain (human).

Here is an example of a Gene record for HLA-DRB1 – major histocompatibility complex, class II, DR beta 1 (human).

PubChem Gene record for HLA-DRB1 – major histocompatibility complex, class II, DR beta 1 (human).

Learn more about genes and proteins in the PubChem Documentation.

Pathways

PubChem Pathway summaries include information about chemicals, genes, or diseases involved in or associated with a biological pathway. The NIH National Human Genome Research Institute defines a biological pathway as "a series of actions among molecules in a cell that leads to a certain product or a change in the cell. It can trigger the assembly of new molecules, such as a fat or protein, turn genes on and off, or spur a cell to move."

This is an example of an Ibuprofen Metabolism Pathway summary from the academic source PathBank.

PubChem Ibuprofen Metabolism Pathway summary from the academic source PathBank.

Learn more about Pathways in the PubChem Documentation.

Cell Line

A Cell Line summary presents PubChem data associated with a given cell line. Cells in a cell line are often used in scientific research. Cell line information in PubChem can come from a variety of sources.

This is an example of a Cell summary for MCF-7.

PubChem Cell summary page for MCF-7.

Learn more about cell lines in the PubChem Documentation.

Taxonomy

Taxonomy summaries in PubChem display data associated with a specific organism, like a human or Norway rat. Taxonomy summaries can include information from multiple sources.

This is an example of a Taxonomy record for Bactrocera oleae (olive fruit fly).

PubChem Taxonomy record for Bactrocera oleae (olive fruit fly).

Learn more about taxonomies in the PubChem Documentation.

Patents

The PubChem Patent collection contains information on what chemicals are mentioned in a given patent document.

Here is an example of a Patent record titled "Process for purifying semaglutide and liraglutide."

PubChem Patent record titled Process for purifying semaglutide and liraglutide.

Learn more about patents in the PubChem Documentation.


Exercises

Answer these questions to check your understanding of the data types in PubChem:

  1. When a data source submits new information about a chemical, a _____ record is created.


  1. The summary page of a __________record provides an aggregated view of all available information in PubChem about a chemical.


Summary

The table below summarizes PubChem Data Types:

Date Type Description
Substance Submitted data about a chemical from a source
Compound An aggregated view of all available information in PubChem about a chemical
BioAssays Description of biological assay experiments and bioactivity test results on substances
Targets: Proteins and Genes Chemical information available for a given protein or gene (or protein encoded by the gene)
Pathways Information about chemicals, genes, or diseases involved in or associated with a biological pathway
Cell Lines Chemical information associated with a given cell line
Taxonomy Chemical information associated with a specific organism
Patents Chemical information mentioned in a given patent