Browse Prior Art Database

An Extended Scalable Hybrid Approach for Record Retention Management System to Adapt to Big Data Disclosure Number: IPCOM000238968D
Publication Date: 2014-Sep-29

Publishing Venue

The Prior Art Database


A hybrid approach is proposed to resolve scalability and performance issue with big data facing by most records retention management systems.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 15% of the total text.

Page 01 of 25

An Extended Scalable Hybrid Approach for Record Retention Management System to Adapt to Big Data

A records management system is a system to manage records which are any type of content stating results archived, pertaining to, and providing evidence of activities performed.

One critical facet of a record management system is that it must ensure the timely records disposal which can help companies or organizations achieve compliance with internal, industry, and governmental regulations and laws. Timely disposal also reduces costs in areas, such as litigation, operations, and records storage.

In record management systems, records are organized in a customizable hierarchical structure called a file plan. Each record in the file plan will have a retention schedule, either inherited from its parent folder or assigned directly, for life cycle management. The retention schedule specify what and when to trigger a record into a specified phase or stage, when to leave that phase, and when to export or dispose. For example, when a record is declared or created, it should enter a phase named phase 1; after a fixed period or triggered by a event, it should enter another phase; finally, it should be disposed, which deletes both the meta-data and content.

To meet the requirement of timely records disposal, records management system must identify eligible records and generate records retention reports per user defined criteria. For example, record administrators usually want to generate a report for records eligible for the next phase today or for a specified date range before performing an actual disposition or phase transition. To generate the retention report, some back-end threads or processes are launched to calculate the life cycle for each folder or record within the specified file plan scope. With increasing volume of records, the time cost continues to increase. Based on some test result, with about hundreds of millions of records, the time to generate a records retention report may reach tens of hours in an optimized runtime environment. Although the generation of records retention reports is usually performed asynchronously in the background, the performance and scalability and impact on overall system performance are unacceptable for customers.

The drawbacks of current known approach are:
1. Records management systems can distribute the work load in the application layer, but the centralized repository such as relational database or content management system, is the bottleneck.

2. The file plan for records management system is a hierarchical structure, which is hard to partition in the centralized repository.

3. The retention schedule applied on each folder or record can be inherited from their parent, so records management system usually need to traverse the targeted sub-tree of the whole file plan in a depth first search to generate the records retention report.


Page 02 of 25

4. Records retention reports can not be generated direc...