Browse Prior Art Database

Method to handle duplication and collisions in voice enabled name dialer applications

IP.com Disclosure Number: IPCOM000022260D
Original Publication Date: 2004-Mar-03
Included in the Prior Art Database: 2004-Mar-03
Document File: 3 page(s) / 32K

Publishing Venue

IBM

Abstract

Disclosed is a method to cluster duplicate names and homonyms instances using hash codes and tables. This method is used by applications that generate large grammars such as Name Dialer and Directory Information sytems and do not have a database component. The information, such as an employee telephone number, is stored in the grammar.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Method to handle duplication and collisions in voice enabled name dialer applications

In voice enabled name dialer applications, one of the most complex components is the grammar generation. Names can be formed in many different ways, depending on origin (for instance, Spanish names are formed differently from English names or Chinese names). Common nicknames, such as William/Bill/Will must be considered, as well as, homonyms meaning names with the same pronunciation, but with different spelling such as Allan/Alan or Tracy/Tracie. Moreover, more than one person with the same name can be part of the directory (eg John Smith from Sales and John Smith from HR) and the algorithm to create the grammar must take all these aspects into consideration. Although the grammar generation procedure is generally executed during non-business hours, it should be efficient to handle large directories.

Grammar Generation Algorithm

The steps to generate grammars are:

1. Load input file nicknames.xml, containing the names with corresponding common nicknames (Ex: William/Will/Bill)
2. Load input file homonyms.xml, containing names that sound the same but are written differently, and represent different persons (Ex. Ozzy/Ozzie)
3. Load and parse input file names.xml, representing the directory of names
4. Grammar generation

In order to support a large directory of names, the algorithm must be efficient to load the names, detecting duplications and applying nicknames and homonyms. The algorithm uses a series of hash tables to optimize performance. To illustrate the algorithm, let's use a sample scenario, which will be described below.

Load nicknames.xml & homonym.xml

Suppose the following information is defined in nicknames.xml:

Name/Common Nickname Hash Code

William 2 Bill 5 Will 6

This step will read the file and create a hash table as following:

2

W illia m

B ill Wi l l

1

[This page contains 1 picture or other non-text object]

Page 2 of 3

Same procedure is used to load homonyms.xml (eg. Ozzy <->Ozzie).

Load and Parse names.xml

After loading the common nicknames and homonyms, the system will load and parse names.xml. In this example, names.xml has the following entries:

Person Given Name Last Name Pref Nick Phone # Location

Person 1 William Osborne Ozzie 1234 Boca Raton

  Person 2 John Smith 5678 Boca Raton Person 3 John Smith 9012 Yorktown Person 4 Ozzy Osborne 0987 Atlanta

For each person defined in names.xml, the algorithm will:

1. Get the special fields used to generate the names combinations: Given Name, Last Name and...