SYSTEMS AND METHODS TO MONITOR DATA QUALITY ISSUES THROUGH A RULE BASED APPROACH
Publication Date: 2015-Mar-26
The IP.com Prior Art Database
The disclosed invention provides a technique that includes rule based approach for monitoring data quality issues in a workflow for an industrial setting. The technique includes a data quality monitoring system (DQMS). The DQMS is a system designed to quantify, measure and report issues in data quality at various points of a user defined workflow. The DQMS system is used to measure quality of data and generate a scorecard which allows the user to check whether the data is up to mark for downstream application, and if not, then suggests moving ahead and clean data. Poor quality data are flagged based on rules. The DQMS enables the users to create a majority of relatively simple rules on a fly. Also the DQMS enables user to change threshold parameters very easily. The user then picks a data set and applies rules on the data set. This generates a report which provides a high level summary of quality of the data. The user is also able to find detailed row level information of DQ defects and develop methods to fix it.
The present invention relates generally to data quality (DQ) system and more particularly to a technique to monitor data quality issues in a workflow of an industrial setting.
Generally, workflows in industrial settings include generation of huge quantities of data at every step. Therefore there exists a requirement for monitoring quality of data at every step as a part of the workflow. Data quality monitoring is necessary to flag data quality (DQ) issues as close to source as possible to avoid propagation of errors downstream in the workflow as much as possible. Data quality monitoring is also necessary to compare quality of data from two similar sources, for example, two shops or two engines, to immediately acquire a sense of quality of data obtained.
Conventional technique includes a highly manual and time consuming process for cleaning data. The manual cleaning process is replaced by DQ systems, which is based on certain software technologies.
The software technologies develop solutions to check simple DQ problems, for example, consistency of dates and misspelling of fields, among others. However, complicated DQ issues based on rules which relies on more than one variable is not readily addressed by such software tools. On the other hand, generic rule engines are not custom made for data quality and may not be as easy to use. Further, such tools require an information technology (IT) aware person to hard code every rules to flag DQ defects. Consequently, end functional user has no ability to create new rules on the fly.
It would be desirable to have an efficient technique to monitor data quality issues in a workflow of an industrial setting.
BRIEF DESCRIPTION OF DRAWINGS
Figure 2 depicts process overview of DQMS over a workflow or over time to check quality of the data.
Figure 2 depicts a high level design of data quality monitoring system (DQMS).
Figure 3 depicts screenshot of creation of new rule libraries (rule sets) and adding new rules to a preexisting rule set.
Figure 4 depicts example of manner in which a rule is designed as including the four elements which are level, scope, filter and defective element.
Figure 5 depicts screenshot of rule creation window in DQMS that includes separate tabs for scope, filter and defect.
Figure 6 depicts screenshot of clause creation window in DQMS, where user chooses variable, operator and value.
Figure 7 depicts screenshot of overall summary section of report generated by DQMS.
Figure 8 depicts screenshot of dependency summary section of report generated by DQMS.
The disclosed invention provides a technique that includes rule based approach for monitoring data quality issues in a workflow for an industrial setting. The technique includes a data quality monitorin...