Browse Prior Art Database

System and method to improve the performance of the candidate list generation process of an Entity Analytics system using in-memory, read-only cache

IP.com Disclosure Number: IPCOM000212210D
Publication Date: 2011-Nov-04

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed here is a system and method that improves performance of an entity resolution process by expediting one of its key sub-processes of candidate list generation with the usage of an in-memory read only cache. A method is described here in which a read only cache is being maintained with a set of high priority entitiy information. These entities which are marked to be cached are chosen on the basis of configurable entity priority rules and mechanisms are also provided to keep this cache is kept upto date to keep the entity information in the cache accurate and to account for the changes that have occured in the entity data as a part of the resolution processes.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 12% of the total text.

Page 01 of 19

System and method to improve the performance of the candidate list generation process of an Entity Analytics system using in -memory, read-only cache

When a record is fed into the entity resolution engine, a list of probable matching entities is generated. This process is called as candidate list generation. Its only after that a rigorous

process of scoring/match-making takes place against this identified list of candidates. This is a very important phase of entity resolution process and thus should be executed the with best

possible accuracy and efficiency.

Introduction to entity analytics and related terms:

An Entity is defined as a data structure that uniquely represents a particular person. This entity is associated with a variety of attributes namely name, address, phone number. Entity Analytics comprises of mainly Entity Identification/Building, Entity Resolution and Entity Relationships. More recently there has been an additional facet of associating transactions with Entities too.

It is an operational intelligence process, typically powered by an identity resolution engine or middleware stack, whereby organizations can connect disparate data sources with a view to understanding possible identity matches and non obvious relationships across multiple data silos.

It involves analysis all of the information relating to individuals and/or entities from multiple sources of data, and then applies likelihood and probability scoring to determine which identities are a match and what, if any, non obvious relationships exist between those identities.

It thus helps organizations solve business problems related to recognizing the true identity of someone or something ("who is who ") and determining the potential value or danger of relationships ("who knows who ") among customers, employees, vendors, and other external forces. It also provides immediate and actionable information to help prevent threat, fraud, abuse, and collusion in all industries.

In most popular implementations, these entities are stored in a relational database and this database is called as the entity database. This database also holds information about the obvious and non-obvious relationships that may exist within the various entities in the entity database.

Existing scheme for the generation of candidate lists during entity resolution

process:

For every incoming record, the process of candidate list generation is done to determine those entities which maybe connected to the incoming identity.

1


Page 02 of 19

The incoming identity data is XML based and a sample of it is shown below illustrating the various data elements which comprise it.

100

A

 

16187

16187

              

M

DANIEL FARADAY

           

PH

91 552 54 72

          

SSN

12345678

         

H

121 MAPLE STREET

LAS VEGAS

NEVADA

89128-8351

USA

In the existing candidate list generation mechanism, for each of the data elements of the incoming record, queries are individually done on the entity database in a sequential manner to find the candidates so that...