Browse Prior Art Database

Business Data Matching Accelerator Disclosure Number: IPCOM000249548D
Publication Date: 2017-Mar-03
Document File: 3 page(s) / 96K

Publishing Venue

The Prior Art Database


This is an idea around Probabilistic Matching and Fuzzy Logic using a tool which enables businesses to link the previously un-linkable. Often these types of projects require a lot of input from business subject matter experts before any development is started. Below I propose an automated method for producing some draft matching specifications and code using my experience of implementing the tool over the last 20 years.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 94% of the total text.


Business Data Matching Accelerator

When at a new client site and having been tasked with matching business entities using probabilistic matching technologies it’s always difficult to work out which columns to

first use. Often client subject matter experts are not available and pressure on delivering results quickly is often the case.

If you don’t know the data you have to manually crawl through many columns and records to work out what may work. This is a very time consuming job and can take weeks.

We propose a tool that accepts two sets of data and works out automatically what are good columns to match on between the sets, without having to do the manual step of

data investigation.

The novelty of this invention includes:

• Significantly reduce time to start the match design process

• Doesn’t take much configuration

• Load data into an area

• Profile data with IA to get inferred data types

• Use inferred data types to work out all the column matching candidates

• Run frequency analysis over both sets of data

• Create match specifications for columns of same data types

• For each different data type use standard blocking and fuzzy techniques as default match commands and blocks – different specs for date, number and strings

• Then compile results into a report

• Create a first pass of match specification that can be loaded into a matching tool

• The tool creates the match commands, blocking may have to also be suggestions

• See diagram below



Probabilistic Matching