Browse Prior Art Database

Multi-Word Abbreviation Resolution

IP.com Disclosure Number: IPCOM000105355D
Original Publication Date: 1993-Jul-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 121K

Publishing Venue

IBM

Related People

Carlsen, KE: AUTHOR [+2]

Abstract

Given a list of hyphenated multi-word names, such as "this-is-one-name" and "another-name", efficiently locate zero or more matches of an abbreviation of one of these names. Abbreviations consist of a subset of the letters contained in each word of a full name, such as "t-i-o-n" for "this-is-one-name", or "an-nam" for "another-name". Cases of no match, single match, multiple matches (ambiguity) must be detected.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 46% of the total text.

Multi-Word Abbreviation Resolution

      Given a list of hyphenated multi-word names, such as
"this-is-one-name" and "another-name", efficiently locate zero or
more matches of an abbreviation of one of these names.  Abbreviations
consist of a subset of the letters contained in each word of a full
name, such as "t-i-o-n" for "this-is-one-name", or "an-nam" for
"another-name".  Cases of no match, single match, multiple matches
(ambiguity) must be detected.

      The abbreviation resolution algorithm efficiently locates zero
or more matches of a given abbreviation by creating optimized
sublists (index tables) that index the list of names and by breaking
the search into two parts.  Initially, the list of names is sorted
(in a special way) and two small index tables are created.  To
perform the search, the abbreviation is first scanned for what we
call the "prefix", or invariant, information and the list of full
names is searched with an efficient search (i.e., binary or hash
table) to locate all names with a matching prefix.  If a match of the
prefix information is found, the search continues with a scan of the
subset of names with matching prefixes.

      In a conventional comparison search problem, one assumption
that is relied upon is that the key (the item being searched for) is
completely specified and not ambiguous.  When resolving
abbreviations, the key (or abbreviation) specified is not complete
and may be ambiguous.  Information has been removed (for brevity) and
thus conventional non-trivial search algorithms (i.e., algorithms
other than an exhaustive linear search) do not work.

      In order to improve upon the exhaustive linear search approach
for abbreviations, it is necessary to isolate exactly what
information can be relied upon to be present in any given
abbreviation.  This invariant information (or "prefix" as we call it)
can then be used in an efficient conventional search algorithm to
significantly reduce the problem (search) space before resorting to
an exhaustive approach.

The Problem Space - The problem faced in developing this algorithm
was resolving abbreviations of hyphenated multi-word names.  This
problem and our solution can be generalized to cover the case of any
multi-word string delimited by any particular character.  The
discussion of the solution focuses on our use of the invention to
locate multi-word names where the delimiter character is a hyphen.

      In the following discussion, the term name refers to a complete
hyphenated multi-word name, such as this-is-one-name.  The term
abbreviation refers to any proper subset of the letters in each work
in a multi-word name, up to and including the full name itself.  At
least one letter from every word in the name must appear separated by
a delimiter (hyphen) in an abbreviation.  For the name
"this-is-one-name", examples of valid and invalid abbreviations are
listed below.

                valid           ...