Skip Navigation Bar
National Library of Medicine Technical BulletinNational Library of Medicine Technical Bulletin

Table of Contents: 2018 SEPTEMBER–OCTOBER No. 424

Previous Next


GenBank Expanded Accession Formats Coming December 2018

GenBank Expanded Accession Formats Coming December 2018. NLM Tech Bull. 2018 Sep-Oct;(424):b9.

2018 September 26 [posted]

[Editor's Note: This is a reprint of an announcement from the National Center for Biotechnology Information (NCBI). To automatically receive the latest news and announcements regarding major changes and updates to NCBI resources and tools please see the subscribe page.]

In December 2018, GenBank and other International Nucleotide Sequence Database Collaboration (INSDC) members will expand the accession formats used for sequencing projects. Nearly all possible accession numbers using the current, shorter formats have been assigned. Using these longer formats will allow expanded accession ranges and provide greater capacity.

The expanded format for Whole Genome Shotgun (WGS), Transcriptome Shotgun Assembly (TSA), and Targeted Locus Study (TLS) sequencing projects will use a six-letter Project Code prefix and a two-digit Assembly-Version number followed by 7, 8, or 9 digits (for example, AAAAAA020000001).

Non-WGS/TLS/TSA nucleotide sequences currently use a "2+6" format, two-letter prefix followed by six digits. This format will be expanded to eight digits.

Protein sequences currently use a "3+5" accession format. By the end of 2018, this format will use seven digits.

Please adjust any processing methods to accommodate these new identifier formats. If you have questions about the new formats, write to the NCBI help desk.

NLM Technical Bulletin National Library of Medicine National Institutes of Health