A Systematic Framework For Managing Data Rules At Scale: For Big Data, The Cloud, And Beyond
Publication Date: 2015-Jun-15
The IP.com Prior Art Database
Disclosed is a method and system to ensure the currency of data validation rules and prevent the generation of invalid results in a database system. The automated system classifies data rules according to the core intent of the rule, and then compliments the structure by classifying the metadata changes that may be made to the columns.
Page 01 of 3
A Systematic Framework For Managing Data Rules At Scale : : Cloud, ,
A well-structured enterprise information management system uses data rules to both specify and validate that the format and meaning of data matches corporate intent. However, as the number of data rules in production increases and data formats change as systems evolve, management costs can significantly increase. Tedious, time-consuming, and often error prone remediation efforts are required to ensure that metadata changes are properly flagged and the corresponding rules are changed and updated. In the absence of any structural support for this activity, this effort can become one in which a person performing the remediation is forced to visually review each column, check the metadata format, read the data rule(s) associated with the column, and then decide whether the data rule is still appropriate.
Data validated for correctness is one of an enterprise's most valuable currencies. The execution of data validation rules at the data source during movement (i.e. Extract, Transform, Load/Extract, Load, Transform (ETL/ELT)) or at the destination is essential to ensuring the quality of data and establishing correctness of value and format. Enterprises spend a considerable amount of resources to create and execute data validation rules and take corrective actions for exceptions found by the validation process.
Data validation rules are based on metadata collected from the data sources. Over time, metadata and rules become unsynchronized resulting in obsolete rules. Obsolete rules can cause invalid exceptions or missed data violations (i.e. violations that are flagged in a rule execution prior to the metadata change). Invalid exceptions result in the waste of resources employed for analysis and remediation of the wrongly reported exceptions. Missed exceptions result in incorrect assumptions about the quality of the data that is used to drive business decisions and may be harder to detect.
Known approaches to address this problem typically involve a combination of periodic synchronization (i.e. re-import) and analysis of metadata to assess the impact of changes to existing data validation rules in a governance processes.
The system disclosed herein classifies data rules according to the core intent of the rule. Classifications of the metadata changes that may be made to columns then compliment this structure. The system then maps the metadata changes to the corresponding rule changes, allowing a precise indication of whether a metadata change invalidates a rule or requires a review. In addition, the system provides the reviewer with an indicator of the nature of the reason that the rule is under review.
With the system, a new method is applied to ensure currency of data validation rules and prevent the generation of invalid results. Responsive to rule creation and binding to a given data source for execution, the system persists a snapshot of metadata of the data source...