Breakout: Sequence Comparison (BLAST) & Visualization
What is BLAST and how can we analyze the results?
The Basic Local Alignment Search Tool (BLAST) is a software program for comparing a biological sequence to those in a database of others. Unlike a global alignment - which attempt to line up sequences from end-to-end, BLAST has a multi-step process of initially finding matches of short sequences (words) which are then extended base-by-base on each end until the compared sequences no longer match up. Thus, you can find regions of similarity between sequences of distantly-related organisms almost as well as longer stretches of homology between closely related organisms. BLAST was originally developed to identify related sequences in evolutionary studies, but it's use has expanded to mapping sequences onto genomes, identifying the sequence's source organism, and more.Some helpful information about BLAST from workshops taught by our Faculty:
- How BLAST works
- How Primer-BLAST works
- How to use BLAST to explore related sequences
- How to use Magic-BLAST to align sequence reads to a reference sequence
- How to use BLAST and STAT for Microbial Metagenomic analysis
A few examples of interest to biology teachers
Identifying a sequence: Which pathogen is it?
- Compare a sample sequence using BLASTn:
- produced with a universal primer set to the bacterial 16S rRNA database
- produced with a universal primer set to the fungal Internal transcribed spacer (ITS) region database
- or a viral genomic sequence to the viral RefSeq genome database
- Tips on analyzing the results:
- Text-based Alignments
- Graphical Alignment display
Comparing protein sequences to see evolution: A hospital outbreak in the NIH Clinical Center!
- Use the Constraint-based Multiple Alignment Tool (COBALT) to:
- compare patient pathogen isolate sequences
- identify and sort by variants
- and use these to propose a path of infectivity
- Tips on analyzing the results:
- Text-based Alignments
- Graphical Alignment display
- TreeView display
Assessing PCR primers: Cloning a region or Designing a diagnostic
- Designing PCR primers for Huntington's Disease Diagnosis
- You HAVE PCR primers and want to use the Human Genome BLAST to test them out on a genome for specificity for amplifying the the triplet expansion region of the HTT gene: CAGCAGCGGCTGTGCCTGCGG & CCATGGCGACCCTGGAAAAGC
- Human Genome BLAST result
- Tip: Use BLASTn, post your primer sequences with ~20 "N"s in between the two, select the RefSeq Genome database, limit the search to Human in the Organism section, select to use BLASTn - NOT MegaBLAST, and expand the Advanced parameter section to set the word size to 7, and increase the Evalue Threshold to at least 1 because the smaller the hit the bigger the Expect value.
- You HAVE PCR primers and want to use the Human Genome BLAST to test them out on a genome for specificity for amplifying the the triplet expansion region of the HTT gene: CAGCAGCGGCTGTGCCTGCGG & CCATGGCGACCCTGGAAAAGC
- Designing PCR primers to clone and then study an enterobacteria toxin
- You would like to have Primer BLAST suggest some good PCR primers to clone the E. coli O157:H7 Shigatoxin A gene
- Primer BLAST result
- Tip: You can either start with a Gene record's "Genomic regions, transcripts and products" graphical view section, or a Nucleotide sequence record's Graphics view display, or even in the Genome Data Viewer. Identify the regions (positions) that you would like each primer to anneal to in order to limit to helpful results. and select regions (control-click will enable you to select two different regions around a coding sequence or a varation, for example) - and then click the "Tools" button>BLAST and Primer search>Primer BLAST (selection) to initiate the search.
- You would like to have Primer BLAST suggest some good PCR primers to clone the E. coli O157:H7 Shigatoxin A gene
- Tips on analyzing the results:
- Text-based Alignments
- Graphical Alignment display
Learning about a gene sequence: Understanding the impact of a cancer patient's sequence
- Map a cancer patient's sequence to well-annotated RefSeq sequences to learn more about the sequence, it's originating gene and possible genetic variants.
- Using Human Genome BLAST to:
- find where it matches best to a similar sequence in the reference genome
- visualize in the Genome Data Viewer to find where the gene is in the chromosome and compare with information in the same are in annotation tracks
- Using BLASTx against the human refseq_protein database to:
- visualize the alignment in "Pairwise with dots for identities" and identify encoded variant amino acids in the protein sequence
- visualize in the Genome Data Viewer to find where the variant is located in the reference gene and compare to known functional regions in the protein. is in the chromosome and compare with information in the same are in annotation tracks
- You can map the hit to data in the same region in other tracks and link to key database records to learn more:
- You can use Primer BLAST to predict a PCR Primer pair for amplifying/sequencing this region which might serve as a potential diagnostic.
- Regions designated for Primer BLAST (view) | Primer Blast results
- You can use BLASTp to find orthologs, examine how far in evolution it goes back and see if the identified variant shows evidence of conservation.
- Orthologs - Jawed Vertebrates | Do Protein Alignment, root with Human zoom in to residue 61 to show invarance | Phylogenetic TreeView is available!
- You can use CD-Search (a.k.a. Reversed PSI (RPS)-BLAST) to identify conserved domains and key residues with functional annotations and in a 3D structure of the protein and then pinpoint the location of a variant and predict the potential impact.
- Protein Alignment with CDDs
- Structure 6MNX | PMID 32605999
- Tips on analyzing the results:
- Text-based Alignments
- TreeView Display
- Using the Sequence Viewer's Graphics display (all sequences) or Graphical Data Viewer (eukaryotes)
- iCn3D
Last Reviewed: July 25, 2024