Two-Dimensional Materials (Texts, Manuscripts, Graphics)
NLM currently digitizes printed monographs in-house using two Book2net Cobra V-Scan image capture systems (image area approx. 18" X 25" per page), two i2S CopiBook scanners (image area approx. 18" X 25" per page), and a Zeutschel 14000-A large format scanner (image area approx. 24" X 38"). The Zeutschel scanner is used primarily for fold-outs and large flat paper objects. Initial monograph digitization was done in-house on a Kirtas KABIS III scanner. Some additional content was scanned offsite by vendors. Digitized texts contain the following components:
Monographs and Serials
Per Book
- OCR – composite text file
- Full color PDF
- Descriptive metadata files (see Descriptive Metadata below)
- METS – an XML document produced by the scanner’s software which encodes page image sequencing as well as technical details of the scanning operation.
- Preview image in JPG format
- Thumbnail image in JPG format
Per Page
- Preservation master image in uncompressed TIFF format, 400 DPI, 24-bit color, with key technical metadata embedded. Earlier images captured with KABIS III scanner have JPEG masters.
- Access derivative image in JPEG2000 format
- OCR – page text file
- ALTO – an XML schema for encoding the structure of physical text resources. This stand-alone file of text is generated per page.
- MIX – an XML schema for encoding the structure of digital still images. This file is produced per page by NLM’s book scanner and embedded in the book-level METS file.
- Thumbnail in JPG format
Other Formats
Digitized manuscript materials, graphical prints of ink on paper, photographic prints, maps, and photographic films are produced typically with TIFF masters and JPEG access derivatives. Scanning DPI standards are generally higher, up to 600 DPI. Most of these materials do not generate text files.
Moving Images (Films and Videotapes)
Many of NLM’s film and video materials were digitized to MPEG2 from BetacamSP or DVD copies. The BetacamSP preservation copies were produced by offsite vendors. Other films and videos were digitized to Matroska (.mkv) from DPX by offsite vendors.
- MPEG2 digital master is a full resolution, 640x480 NTSC video, with audio as in the original.
- Matroska with FFV1 video encoding digital master is a full resolution, 2560 x 1920 (film), 720x486 NTSC video, with audio as in the original.
- Access derivatives are created in the MP4 or MOV format with H.264 compression and AAC audio.
- Descriptive metadata files (see Descriptive Metadata below)
- METS XML preservation metadata
- Transcript (text file)
- Time-coded captions in DFXP, SRT, and VTT formats
- Preview image in JPG format
- Thumbnail in JPG format
Audio
- Master recording digitized to WAV format
- Access derivatives are created in MP3 format
- Transcript
Descriptive Metadata
- Typically, each resource in Digital Collections has a corresponding MARC bibliographic record in NLM's LocatorPlus Catalog.
- For each digitized resource, three descriptive metadata files are generated:
- MARCXML – the base metadata used to generate other metadata in Digital Collections; supplied to Internet Archive for NLM digital resources also made available on that site
- OAI-compliant Dublin Core – for public consumption
- DMDINDEX – an internal custom scheme that drives indexing and UI display
Last Reviewed: November 1, 2024