Downloading NCBI Biological Data and Creating Custom Reports Using the Command Line
NCBI Faculty: Peter Cooper, PhD & Wayne Matten, PhD
Workshop Duration: 2 hours
Content Difficulty: Intermediate
Target Audience:
Workshop Description:
In this workshop you will learn to use both the EDirect suite and the Datasets CLI to download gene sequences, genome assemblies and their associated metadata, and create custom reports that cross reference biological features and sequence data.
In this workshop you will learn how to:
- Use the EDirect suite to search for and collect sequence and gene data data across NCBI databases
- Incorporate the the EDirect XML parser Xtract into workflows to create and format custom reports
- Use the Datasets CLI to access and download genome sequences and metadata in order to build custom databases
- Use the dataformat tool to generate reports from downloaded genome metadata to classify and filter genomes by biological criteria
- Incorporate these tools into workflows with other bioinformatic tools such as BLAST
If you are a command line novice and want to gain familiarity with command line while also using NCBI tools, please consider applying to the workshop on March 28th, 2023: An Introduction to Accessing NCBI Resources on the Command Line using EDirect for Biologists.
Data Access Technology: Jupyter Notebook, Command-line
NCBI Resources: Entrez Direct (EDirect), NCBI Datasets Command-line (CLI) tools
Last Reviewed: April 12, 2023