Browse Prior Art Database

TAPE INDEXING METHODOLOGY FOR LONG TERM ARCHIVES

IP.com Disclosure Number: IPCOM000174460D
Original Publication Date: 2008-Sep-09
Included in the Prior Art Database: 2008-Sep-09
Document File: 8 page(s) / 45K

Publishing Venue

IBM

Abstract

Disclosed is a convention for long term archival storage using digital tape such as LTO, Linear Tape Open. The proposal focuses on tape storage of data and associated index metadata in a manner to optimize data recall. It maximizes readability via redundancy, index and data locality, and minimizing or eliminating dependencies on external data or software. Source data files are simply concatenated together into "data areas" and written to tape. The metadata about the source data files, including file name and size, is collected, formatted into XML and written to tape, creating an "index area". Additional data areas and index areas can be appended until the tape is full. Index areas are cumulative, each contains a complete record of the data preceding it on the tape. The last index area on a tape is written twice. Data areas and index areas are separated by tape filemarks. Data area sizes and index area formats are not further defined, but rather left open to implementation and technology specific details. Digital Media Archives of video or film assets are particularly suited to this convention since it matches the current media archive paradigm more closely than storage manager driven automated tape libraries. Adoption of this convention will facilitate entry of LTO et al. into this market segment.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 19% of the total text.

Page 1 of 8

TAPE INDEXING METHODOLOGY FOR LONG TERM ARCHIVES

Introduction

Digital tape formats such as LTO (Linear Tape Open) are popular for the economical storage of large amounts of data. Unfortunately, tape suffers from several drawbacks such as the lack of an index structure and serial access restrictions. Existing archives use storage managers or file aggregation programs like tar to overcome these limitations.

However, storage managers and aggregation programs create other issues which render them unsuitable for certain archive applications involving the long term offline storage of data. A data format design is proposed which is better suited to this class of archive.

The focus will be on LTO tapes; however, the principles apply to any serial data storage technology.

Tape Format Details

LTO tape has a simple layout. Data is written as blocks, starting at the beginning of the tape and continuing to the end, or an application can write a filemark. Filemarks are intended to separate logical groups of data blocks. 2 filemarks in a row are interpreted as the end of data on the tape. A tape drive can scan for filemarks and quickly index to a specific filemark.

Tape Format Issues

No Indexing Structure

There is no inherent mechanism for indexing or holding descriptive data about the data on the tape. Disk drives are formatted in various ways, FAT, FAT32, NTFS, JFS, etc. and this formatting creates a mapping between metadata (e.g. filename, permission, size ) and the blocks of data. Tape has no such formatting structure. It consists only of data blocks and filemarks.

Serial Access

Tape is a serial appendable technology. Data is written as the tape moves forwards. It is difficult or impossible to rewrite or overwrite a block of data, i.e. update in place, and retain previously written data after the update. Once a block of data is written, all subsequent blocks must be overwritten. This renders difficult or impossible methods such as those employed on disks where an area is set aside to contain the index information and then continually updated as data is added or modified. Standard practice for writing to a partially filled tape is to index to the end of the data (scan for 2 filemarks), backup 1 filemark, write any new data, then write 2 filemarks to indicate the new end of data.

1

Page 2 of 8

Buffer Flushes

Tape drives employ a memory buffer to better match data speeds to tape speeds. Ideally, the tape will continue to move at its top speed while data arrives quickly enough to keep the data buffer filled providing data to write. If the buffer empties, the tape will stop and then need to restart, a time consuming process that can significantly impact rated data transfer speeds. Filemarks can cause a buffer flush to ensure an application that all of its data has been written to tape and would not be lost from the buffer in the event of a power failure. Writing a large number of small files separated by f...