Exercise 3 - Visualize Data with the Genome Data Viewer

Background

What is the Genome Data Viewer?

Genome Data Viewer phylogenetic tree graphic from the main webpage

NCBI’s Genome Data Viewer (GDV) is a genome “browser” which supports the visualization of genetic data mapped against any of >1500 NCBI curated/annotated eukaryotic reference genomes. https://www.ncbi.nlm.nih.gov/genome/gdv/

Data is visualized in "tracks".

You can include gene/feature annotations, sequence coverage, GWAS data, and more!
You can mix/match between your own tracks and NCBI/partner proivded ones!

Small screenshot of Genome Data Viewer example featuring multiple tracks on display

Objective 3 Goals

Computational

Access and navigate the GDV
Upload custom data tracks to GDV
Parse biological meaning from alignment results
Use NCBI track data to find known clinical relevance

Case Study

Identify structural changes between patient DNA and reference sequence to identify possible deletions in BBS related gene
Use NCBI dbVar data to match results to known structural variants

Flowchart of current objectives. We have aligned sequences with magicblast and added them to our S3 bucket. Now we need to import the data from our bucket to the Genome Data Viewer to visualize it

Setup the Genome Data Viewer

1) Open a new tab in your web browser and go to https://ncbi.nlm.nih.gov/genome/gdv/

2) Make sure the Human is selected from the tree on the left

Highlighting the "Human" selection from the GDV Phylogenetic tree

3) Scroll to the bottom of the page and click on the 7th Chromosome image to load the Genome Data Viewer on the human reference genome’s 7th chromosome

Highlighting the Chromosome 7 graphic in the Human selection on GDV

4) The GDV page comes pre-loaded with several tracks aligned against the chromosome. Most of these are not useful to us today, so we can use the red X buttons in the top right corner of each track to delete them. Do this for every track except the top one. This top track shows every gene and its position on the chromosome.

Highlighting the red x in the corner of each track we wish to delete

NOTE: There are LOTS of NCBI-offered tracks you can upload ato compare against your own data. To learn more about them click the little gear at the bottom of the viewer page:

Importing Our Data

1) Click on User Data and Track Hubs on the left side of the screen

Highlighting the "User Data and Track Hubs" tab in the left side of the GDV page

2) Click the Options pulldown menu and click Add Remote Files…

Highlighting the "Add Remote Files" option from the options tab

3) Navigate back to your S3 bucket tab and click on the SRR6314034.sorted.bam file to open up the details for the file

Highlighting the "SRR6314034.sorted.bam" file in our S3 bucket page

4) On the new page, click the “Copy” button next to the Object URL to copy the URL path to the file to your clipboard

Highlighting the double boxes "copy" button next to the Object URL path in the S3 page

5) Go back to your GDV tab and paste the link into the URL box. Next, add a familiar name like Child to the Name box to help us identify the track later. Then click Add

Adding a name "Child" to the remote file box in the GDV webpage

6) This track is showing all of the results of magicBLAST in a “pile-up” view. This is basically one long histogram plot where a taller bar represents a region of the chromosome where more reads from the sample aligned to. Because our sequences are specific to a single gene in the chromosome, but our current view is showing the entire chromosome, the pile-up view may look a little bland.
Try to find the region of the chromosome that our reads aligned to:

Highlighting the small peak in the pile-up view of our custom track

7) Use the scale bar at the top of the viewer and click-and-drag across the section where our reads aligned to highlight it. Then use the pop-up menu to click Zoom On Range

Highlighting the sequence range covered at the top of the tracks panel and the "Zoom on Range" option from the drop down menu

8) Repeat step 7 using the new view to refine the range again if the view didn’t change very much.

Highlighting the sequence range covered in the second zoom in for the tracks panel and the "Zoom on Range" option selected from the drop down menu

9) Your view should now see the track similar to the screenshot below. If you don’t see the mess of red lines below the thick black bar, that’s okay! We will turn it off next anyway.

Showing the updated pile-up view after enough zooming in to the custom track.

NOTE: The tracks may be slightly different depending on how you have zoomed in. As long as you can see the image above somewhere on your screen you are doing great!

10) Those red lines underneath the thick black box are showing how each individual sequence read aligned to the reference sequence. This particular view isn’t very helpful to us, so let’s turn it off.

NOTE: If you can’t see these lines, that’s great! You can skip Steps 15 & 16

11) Click the little gear in the top right corner of our Child track to open its settings.

Highlighting the small gear icon in the top-right corner of the track to customize the view

12) On this new menu page, change the “Alignment Display” to “Packed” and then click Accept. You can change many other settings here as well, but I’ll leave that up to you to explore outside of this workshop.

Highlighting the "Packed" option from the drop-down menu of the Alignment Display option. Then highlighting the "Apply" button to save the change.

Now that we can see the range a bit better, let’s break down what each of these colors represent in the pile-up view:

Grey – This is the standard “bar” for the pile-up view. The taller this bar is in a particular region, the greater the coverage is from the mapped reads.
Red – These are locations in the genome where reads mapped, but with mismatches in the nucleotide sequence compared to the reference sequence.
Black – These are gaps that exist in the read alignment to the reference genome (i.e., the reads only have a nucleotide sequence that covers before/after the large black chunk).

If you want to explore the pile-up view a bit more, try using the buttons in the toolbar just above the numeric range to navigate the assembly. Hold your mouse over the button for a description of what they can do!

Adding NCBI Data

1) Click the Tracks button in the bottom right corner of the viewer panel to open the Configure Page

Highlighting the "Tracks Shown" button in the bottom-right corner of the tracks page on GDV

2) Click on the Variation tab and scroll way WAY down to the dbVar category. Then click the checkbox next to dbVar Pathogenic Clinical Structural Variants then click Configure in the bottom right corner of the page.

Highlighting the Variation tab on the right-hand side of the pop-up menu, the dbVar category from the scrolling list, and the checkbox next to "dbVar Pathogenic Clinical Structural Variants (subset of nstd102)" option in that category

3) You should now have a new structural variants track loaded into the viewer. This track shows the region of the genome where each variant is found. Blue variants are caused by an insertion in this region, while Red variants are caused by a deletion. Because our alignment suggests a deletion, we want to focus on the red variants.

Showing a subset of variants aligned to the BBS9 gene region

4) Next, zoom in on the right-half of our aligned region in the child track like the screenshot below. We want to look closely at the BBS9 gene to find structural variants that overlap with this region.

Showing the highlighted region of the sequence which covers the beginning of the BBS9 gene region until the end of our custom data alignment to zoom in on

5) Obviously, there are a LOT of variants that overlap in this region. However, our concern is only about our sequenced region of the gene. Variants which extend beyond our sequenced region are less likely to be relevant for us. So rather than just aimlessly checking every red variant, lets look only for variants that start or stop within our deletion.

Oh… there’s just one? Let’s check that one then.

Highlighting the only variant which meets our criteria "nsv1398255"

6) Mouse over the variant nsv1398255 to get a new pop-up menu and select the dbVar link at the bottom of it

Highlighting the pop-up menu from clicking on the variant and the dbVar link at the bottom of the menu to click on

7) On the dbVar page, navigate to the Clinical Assertions panel to see which clinical conditions have been associated with this deletion

Showing a subset of the Clinical Assertions table which confirms the phenotype of Bardet-Biedl syndrome caused by this particular deletion

8) Just as we suspected! This region is associated with Bardet-biedl syndrome. If we wanted to, we could click on the phenotype and explore more about the condition. But that is something you need to explore on your own, because this is the end of the worksheet!

Page 1 of 1

Last Reviewed: June 30, 2022