Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Downloading NCBI Biological Data and Creating Custom Reports Using the Command Line

This workshop was offered virtually on April 25, 2023.

NCBI Faculty:  Peter Cooper, PhD & Wayne Matten, PhD

Workshop Duration:
2 hours

Content Difficulty: Intermediate

Target Audience:

This workshop is for biological researchers who would like to incorporate NCBI command-line clients into their workflows to access and process NCBI molecular data and metadata. You do not need to have prior experience with Entrez Direct (EDirect) suite or the Datasets command-line interface (CLI) tools (datasets and dataformat), but you should be familiar with NCBI databases and comfortable using the Unix/Linux shell to get the most out of this workshop.

Workshop Description:

In this workshop you will learn to use both the EDirect suite and the Datasets CLI to download gene sequences, genome assemblies and their associated metadata, and create custom reports that cross reference biological features and sequence data. 

In this workshop you will learn how to:

  • Use the EDirect suite to search for and collect sequence and gene data data across NCBI databases
  • Incorporate the the EDirect XML parser Xtract into workflows to create and format custom reports
  • Use the Datasets CLI to access and download genome sequences and metadata in order to build custom databases
  • Use the dataformat tool to generate reports from downloaded genome metadata to classify and filter genomes by biological criteria
  • Incorporate these tools into workflows with other bioinformatic tools such as BLAST

If you are a command line novice and want to gain familiarity with command line while also using NCBI tools, please consider applying to the workshop on March 28th, 2023:  An Introduction to Accessing NCBI Resources on the Command Line using EDirect for Biologists.


Data Access Technology: Jupyter Notebook, Command-line

NCBI Resources: Entrez Direct (EDirect), NCBI Datasets Command-line (CLI) tools

Jupyter notebook button iconPlease note that the first time you access this notebook - it may take up to ten minutes to start up.

Last Reviewed: April 12, 2023