Exercise 1: Text Searching in PubChem
Background
People often know the name of a chemical and need to learn specific information about it. Some examples include:-
-
A High School Chemistry teacher might want to find the chemical structure of Viagra.
-
An Organic Chemistry Lab student might want to know the Melting point of Acetylsalicylic Acid.
-
A Biological Research assistant might want to learn about know pharmacological properties of cis-Platin
-
A Nursing Assistant might want to learn more about available dosages for Tamoxifen Citrate.
-
A Laboratory Manager might want to find chemical safety data and toxicity information about Sodium Dodecyl Sulfate
-
PubChem Compound records aggregate information based on an identical structure - and then add information based on synonyms provided to us by Submitters of chemical information, as well as identifiers from Contributors (UNII, EC, ICSC Number, RXCUI, etc). Based on the actual chemical structure, we calculate structure-based names such as IUPAC Name, Canonical SMILES, InChI and InChIKey and these are also available. You can search with any of these and more to find the chemical record(s) you are looking for!
Searching in PubChem
You can search PubChem for chemicals by name, molecular formula, structure, or other common identifier.Example Search Strategies:
Keyword or names |
|
Molecular Formula |
|
Simplified Molecular Input Line Entry System (SMILES) |
|
International Chemical Identifier (InChI) |
Searching with Entrez
For some advanced searches, you can use the standard Entrez Search Syntax (which is NCBI’s primary text search and retrieval system) which allows you to specify searching in Indexed Filters and Fields, such as:- has_3d_conformer[filter]: records have 3d conformers
- has_dailymed[filter]: records that include dailymed drug/medication info
- has_pharm[filter]: records with curated information about pharmacological action
- has_patent[filter]: records with associated patent info
- has_src_vendor[filter]: records with vendor info listed
- "anti inflammatory agents, non steroidal"[pharmaction]: records with annotated NSAID activity
- "pt"[element]: records for compound that contain platinum
- "chemical vendors"[sourcecategory]: records that have at least 1 chemical vendor listed
- 150.00:182.00[molecularweight]: records for compounds that are between 150.00 and 182.00 grams/mo
Learn more about using Entrez Indices and Filters in PubChem
Text Searching in PubChem- Entrez Example
- Start your search on the PubChem Homepage.
- Click the "Use Entrez" check box under the search box
- Type or copy/paste in "anti inflammatory agents, non steroidal"[pharmaction]
- Narrow the list to single chemical NSAIDs with: AND 1[CovalentUnitCount]
- You can further narrow the list to smaller chemical molecules (<400 g/mol) with: AND 0:400[molecularweight]
- Or narrow down your search by including a simple term: AND aspirin
- We want to learn more about aspirin, so review the results and answer the following questions:
Note: If you get stuck, review the below section "What information can you find for your chemical?" or ask instructors for help!- What are synonyms for aspirin?
- What is the molecular formula?
- What is the molecular weight?
- Are there any patents associated with the entry?
- Do you recognize any of the information sources?
- Click on the LCSS Datasheet and review the information
What information can you find for your chemical?
Once you've found a record, take a look at the expandable Table of Contents. Everything we know about the chemical, which in many cases is quite a lot, is listed in this hierarchical structure. If is isn't listed there, we do not have the information in PubChem.
Where does the information come from?
Experimental Chemical and Physical properties are provided to us by Contributors. Additional "annotated information" has also been contributed such as pharmacological actions, toxicology data, safety and health information, etc. The source of the information is listed just below the information in green.
If you find some information that you think should be corrected, the authors of the contributed information are the only ones who can review and update their information. You can click on the green name to learn more about where the data came from and often find a link to be able to go to the original information on the Contributor's webpage.
Laboratory Chemical Safety Summary (LCSS) Datasheets
The LCSS format is based on the format described by the National Research Council in the publication "Prudent Practices in the Laboratory: Handling and Management of Chemical Hazards" (2011) and contains pertinent chemical hazard and safety information about the chemical desribed in the PubChem record. Not all PubChem Compound records have available LCSS datasheets - they are only available when a Globally Harmonized System of Classification and Labeling of Chemicals (GHS) Classificaton exists for that chemical.
Currently there are over 167,000 PubChem Compound records with linked LCSS datasheets.
Take home exercise: Learn more about a chemical!
- Start your search on the PubChem Homepage.
- Type in sodium hydroxide
- Scroll down the results page to see that there are a lot of matching records
- Back at the very top, we've suggested a Best Match! Click the blue aggregate name to get to the PubChem Compound record for sodium hydroxide
- Take a look over at the (table of) Contents (on the top-right) to see and be able to quickly jump to sections within the record for more information
- You can find and download images of 2D and 3D structures
- You can find calculated and experimental physical and chemical properties
- You can learn about any known biological activities in the Pharmacology and Biochemistry section
- You should review the Laboratory Chemical Safety Summary (LCSS) Datasheet for this corrosive chemical
Here are other example text searches to try and types of information you may want to find:
Search with: |
Look for: |
tamoxifen |
the structure of this chemical |
RWWYLEGWBNMMLJ-YSOARWBDSA-N |
what this chemical is often used for |
p-nitrophenyl phosphate |
what a 1H & 13C-NMR of the pure compound looks like |
acylamide |
a Laboratory Chemical Safety Summary (LCSS) Datasheet |
Take home exercise: Finding similar structures to download as a dataset
- Go back to the PubChem Compound record for aspirin.
- In that top summary box you can click on "Find Similar Structures" to go to a page that will allow you to retrieve datasets of other chemicals that are similar.
- Identity: retrieves the exact structure
- Similar: retrieves structures that are 90% similar (this is adjustible with "Settings" - on the right)
- Substructure: retrieves chemicals that have your structure (aspirin) as the base structure - with other atoms/groups attached
- Superstructure: retrieves chemical groups that make up portions of the structure
- 3D Similarity: retrieves structures that are similar in 3-dimensional space (learn more)
Learn more about Advanced Structure Searching
- To download the dataset, click the "Download" button
Take-away Message
- There is so much information aggregated on to PubChem Compound records that it is pretty easy to find a record you are looking for
- While you can always scroll through the structured PubChem Compound pages, the Table of Contents for each record is useful to help find specific information in these, often, long pages
- PubChem Text Search and PubChem Compound were key resources for this exercise. Review PubChem sponsored documentation for more information
Last Reviewed: November 16, 2022