Browse Prior Art Database

Person Name Analysis using pattern matching techniques

IP.com Disclosure Number: IPCOM000240407D
Publication Date: 2015-Jan-29
Document File: 5 page(s) / 200K

Publishing Venue

The IP.com Prior Art Database

Abstract

Person name is a valuable information, be it in structured, semi structured or unstructured data. Person names can be further used in interpreting the context and significance of any content. But, the multicultural and multilingual nature of the name data makes it difficult to be analyzable(infer any details from the name). The earlier implementations of person name analysis depend on dictionary of zillions of names which makes the product heavy on memory and speed, or make too many assumptions which result in inaccurate results. The current submission proposes a patterm based analysis of name data which can analyze a person name more accurately and in a memory efficient and faster way. The pattern based analysis infers metadata or properties of the person owning that name, the easiest may be the gender, but other non-obvious and more region specific attributes such as the religion as well.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 46% of the total text.

Page 01 of 5

Person Name Analysis using pattern matching techniques

Introduction: Person name information is very important in any structured, semi structured or unstructured data, which can be further used in interpreting the context and significance of any content.

Background:


Problem Statement: The multicultural and multilingual nature of the name data makes it difficult to be analyzable. Many attempts have been made to simplify the analysis of name data. There are few enterprise products which do offer solution to this problem. But, they depend on dictionary of zillions of names which makes the product heavy on memory and speed, or make too many assumptions which result in inaccurate results. These too aspects make the existing products tough for integration.

Note: Analyzing names means inferring metadata or properties on the person owning that name, the easiest may be the gender, but other non-obvious and more region specific attributes such as the religion.

Business Case: IBM InfoSphereQualityStage is used by major telecom and retail companies in India, which have a large penetration within India. QualityStage has a requirement to classify names in India by religion(mainly the two main religions in India i.e., Hindus and Muslims) and gender, because the customers have a requirement to be able to market customized products for specific religion/gender such as daily alerts, greetings and festival specific offers.

Known Solutions: To serve the above mentioned purpose QualityStage currently maintains a file with a huge set of names written like this

Likhitha F

Rakesh M

Anand M

Padma F

.

.

.

And as of now QualityStage has no method implemented to identify religion of a person.

It looks up in this file to to get the gender information of a person.

Drawback of Existing Solutions:

1


Page 02 of 5


1) Firstly looking up such huge file for every name is heavy on memory and speed. A small dictionary will result in poor match and hence poor results.


2) It is difficult to maintain complete set of names, there could always be something missing, in which case it fails to retrieve the information. So, this approach could have high failure rate.


3) Again for the above two reasons it cannot scale.


4) The above approach cannot infer religion for the given name.

Need for a New Solution: A new lighter solution is required to identify and analyze a person name more accurately and in a memory efficient and faster way.

Summary:

The current submission proposes solution to analyze person name data without having to look-up on a dictionary of names and matching names. It consists of a self learning algorithm which reads through the dictionary of the prospective name data once, learns patterns in the names, and remembers only the patterns, not the actual names. So, it doesn't have to go through all the names to find a match when it is asked to recognize a particular name. This approach makes it faster than the existing approaches. Also, memory efficient because it...