Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

System and method for improving ingestion rate of data stores

IP.com Disclosure Number: IPCOM000175631D
Original Publication Date: 2008-Oct-16
Included in the Prior Art Database: 2008-Oct-16
Document File: 9 page(s) / 42K

Publishing Venue

IBM

Abstract

As the volume of the data stored grows, the requirements on how data should be managed also gets increasing complicated, driven by both regulatory compliance reasons and business needs. Such requirements range from storage policies aiming at making sure that data is copied and positioned in the place with the right format, to application-specific content mining and analysis that tries to bring more value out of data beyond archival and retrieval.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 16% of the total text.

Page 1 of 9

THIS COPY WAS MADE FROM AN INTERNAL IBM DOCUMENT AND NOT FROM THE PUBLISHED BOOK

ARC820060157 Jeffrey P Aiello/Almaden/IBM Windsor Hsu, Xiaonan Ma

System and method for improving ingestion rate of data stores

Background:


In today's world, the amount of information that needs to be archived grows rapidly, alon g with the retention period of many data records. The situation is compounded by the growing number of regulations dictating how electronic records should be managed and retained, motivated by high-profile industry scandals.

  As the volume of the data stored grows, the requirements on how data should be managed also gets increasing complicated, driven by both regulatory compliance reasons and business needs. Such requirements range from storage policies aiming at making sure that data is copied and positioned in the place with the right format, to application-specific content mining and analysis that tries to bring more value out of dat
a beyond archival and retrieval. The requirements can cover many aspects of ILM (information lifecycle management), for example, they may specify how long a particular type of documents should be stored (retention), what type of storage should be used (WORM or rewritable), how data should be migrated among different tiers of storage for cost effectiveness, whether to enable encryption/secure shredding, number of replication, etc. Furthermore, specific data management actions often need to be invoked for a particular class of objects, and unlike traditional storage management system where such classification is mostly based on simple object attributes such as file name or access time, in today's data stores objects are increasingly being classified an
d managed based on application-specific metadata and actual document content.

  Another critical requirement for future storage and archive solutions is high ingestion rate, in term of both the amount of data deposited and the number of data object create d within a certain period of time. High ingestion rate is often required by large organizations and certain industry sectors where the number of unstructured data records such as emails generated with a short time interval could be staggering.

However, few existing data store solutions can meet all the requirements mentioned

1

Page 2 of 9

above. A typical scalable data store of today consists of one more object systems (her e by object systems we refer to a wide range of storage systems including file systems, object storage systems, tape libraries, etc.) for storing the content (and probably some basic attributes) of data objects, and one or more metadata server for managing the objects. Here is the typical data how in such a system: a new data record arrives, the metadata server is notified and it decides whether the object creation request should be accepted, if not, the request is rejected, otherwise one or more object system is selecte d to store the data...