Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Matching As A Service - A Cloud based Approach to Probabilistic Data Matching

IP.com Disclosure Number: IPCOM000234680D
Publication Date: 2014-Jan-28
Document File: 6 page(s) / 268K

Publishing Venue

The IP.com Prior Art Database

Abstract

This proposal offers a solution offering called 'Matching As A Service' - a cloud-based approach to provision probabilistic data matching and linking services that allows applications to consume the matching service on a subscription/metering basis.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 18% of the total text.

Page 01 of 6

Matching As A Service - A Cloud based Approach to Probabilistic Data Matching

Background

With data growing exponentially, data matching has emerged as a basic need for organizations to create and maintain clean and consistent data. At the same time, it is not always feasible for organizations, especially in the emerging markets to invest in highly sophisticated matching solutions due to a variety of challenges such as: poor data quality, prohibitive infrastructure costs, lack of trust among the various stakeholders, etc. Organizations also need to embrace new age challenges like Big Data, which otherwise can be disruptive. Matching as a Service (MaaS) can prove to be an ideal solution for emerging markets as the initial investment will be markedly lower and the various business units within an organization can embark on a best of breed matching solution. And if they decide to proceed with a critical business initiative such as Master Data Management (MDM), the results of the MaaS can be used to cleanse the data prior to mastering it within MDM. However, offering Matching as a service has been a challenge because of the need to customize the solution for different datasets, and cater to individual matching requirements of different organizations. This is because traditional systems have used a deterministic approach to matching which is highly dependent on custom business rules. In this paper, we are proposing an approach using probabilistic techniques which are more flexible, provide a higher level of accuracy and do not involve complex and cumbersome rules.

Known Solutions and Drawbacks

Most enterprise matching software today use deterministic algorithms for data matching. These algorithms rely on pre-coded expert rules and resource dictionaries to define how records should be parsed and standardized. These systems fail when the rules are no longer appropriate for the data collected and must be updated to achieve respectable accuracy . In contrast Machine learning based probabilistic matching algorithms rely on training datasets to compute attribute distributions and are considered to be more accurate when compared to deterministic matching [7]. However, it is also a proven fact that all probabilistic matching engines are not built equal [10] and not all probabilistic algorithms applied to the same set of circumstances yield results with the same degree of accuracy.

Organizations in the emerging markets today follow different approaches to meet their data matching and de -duplication needs. Some of them include:


A. Data Cleansing and Matching - Outsourced


Some organizations employ offline, outsourced matching of information that is performed by different third party organizations. Here data is typically provided in a spreadsheet to the third party organization who in turn perform the required data standardization and matching using a combination of man power and non-standard software tools. While this approach is known to be hig...